CrackedRuby - Performance Testing

Overview

Performance testing evaluates how a software system performs under specific workload conditions. The practice measures response times, throughput rates, resource consumption, and system stability to identify bottlenecks, validate scalability requirements, and ensure applications meet performance criteria before production deployment.

Performance testing differs from functional testing by focusing on non-functional attributes. Where functional tests verify that code produces correct outputs, performance tests measure how quickly and efficiently those outputs arrive. A function might pass all unit tests yet fail performance requirements by executing too slowly or consuming excessive memory under realistic load conditions.

The discipline originated from mainframe capacity planning in the 1960s, evolving through client-server architectures to modern distributed systems. Contemporary performance testing addresses microservices, containerized deployments, serverless functions, and globally distributed applications where performance characteristics involve network latency, service orchestration overhead, and eventual consistency trade-offs.

Performance testing provides quantitative data for architectural decisions. Measuring actual performance characteristics reveals whether caching strategies reduce database load, whether horizontal scaling improves throughput linearly, or whether microservice communication overhead exceeds projected limits. These measurements inform infrastructure sizing, code optimization priorities, and service level agreement definitions.

# Basic response time measurement
require 'benchmark'

result = Benchmark.measure do
  1000.times { process_user_request }
end

puts "Total time: #{result.real}s"
puts "Average per request: #{result.real / 1000}s"

Performance testing occurs throughout development cycles. Early-stage testing validates architectural assumptions with prototypes. Pre-release testing confirms the application meets requirements under projected load. Production monitoring provides ongoing performance validation and regression detection.

Key Principles

Performance testing encompasses multiple testing types, each measuring different system characteristics under distinct conditions. Load testing applies expected user volumes to verify the system handles normal operations. Stress testing exceeds normal capacity limits to identify breaking points and failure modes. Soak testing maintains sustained load over extended periods to detect memory leaks, resource exhaustion, and degradation over time.

Spike testing introduces sudden load increases to measure elasticity and recovery. Volume testing focuses on data quantity rather than concurrent users, validating database performance with realistic data volumes. Scalability testing measures how performance characteristics change as resources increase, determining whether systems scale linearly or encounter diminishing returns.

Performance metrics quantify system behavior across multiple dimensions. Response time measures the interval from request initiation to complete response delivery. Throughput indicates requests processed per time unit, typically measured in requests per second or transactions per minute. Concurrent users represent simultaneous active sessions the system supports while maintaining acceptable response times.

Resource utilization tracks CPU consumption, memory allocation, disk I/O rates, and network bandwidth usage. Error rates measure failed requests as load increases, revealing capacity limits and failure modes. Latency distribution shows response time variability, distinguishing median performance from worst-case scenarios that affect user experience.

# Measuring multiple performance metrics
class PerformanceMetrics
  attr_reader :response_times, :errors, :start_time

  def initialize
    @response_times = []
    @errors = 0
    @start_time = Time.now
  end

  def record_request(duration, success)
    @response_times << duration
    @errors += 1 unless success
  end

  def throughput
    @response_times.size / elapsed_time
  end

  def elapsed_time
    Time.now - @start_time
  end

  def percentile(p)
    sorted = @response_times.sort
    index = (p / 100.0 * sorted.size).ceil - 1
    sorted[index]
  end

  def report
    {
      total_requests: @response_times.size,
      throughput: throughput,
      error_rate: @errors.to_f / @response_times.size,
      median: percentile(50),
      p95: percentile(95),
      p99: percentile(99)
    }
  end
end

Baseline establishment creates reference points for performance comparison. Baseline measurements capture current performance characteristics before optimization work begins, enabling quantitative assessment of improvements. Baselines also establish acceptable performance ranges, making regressions detectable through continuous monitoring.

Test environment configuration critically affects result validity. Performance tests require production-like environments with equivalent hardware specifications, network topology, and data volumes. Testing against development databases with minimal data produces misleading results when production databases contain millions of records with different query execution plans.

Think time represents user behavior delays between requests. Users read screens, compose inputs, and pause between actions. Performance tests incorporating realistic think times produce accurate concurrent user simulations. Tests without think time generate unrealistic request patterns that overstate system capacity.

Ruby Implementation

Ruby provides multiple approaches for performance testing, from simple benchmarking to comprehensive load testing frameworks. The standard library includes Benchmark for basic timing measurements, while third-party gems offer sophisticated testing capabilities.

The Benchmark module measures code execution time with minimal overhead. The measure method returns timing information for a block, while bm compares multiple implementations. The bmbm method performs initial warm-up runs to minimize startup effects on measurements.

require 'benchmark'

# Comparing different implementations
Benchmark.bm(20) do |x|
  x.report("String concatenation:") do
    100_000.times { str = ""; 10.times { str += "x" } }
  end
  
  x.report("String interpolation:") do
    100_000.times { str = ""; 10.times { str = "#{str}x" } }
  end
  
  x.report("Array join:") do
    100_000.times { arr = []; 10.times { arr << "x" }; arr.join }
  end
end

The benchmark-ips gem measures iterations per second rather than total execution time, providing intuitive performance comparisons. The gem runs code repeatedly to establish statistical confidence in measurements, automatically adjusting iteration counts to achieve stable results.

require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(time: 5, warmup: 2)
  
  x.report("select") do
    (1..1000).to_a.select { |n| n.even? }
  end
  
  x.report("reject") do
    (1..1000).to_a.reject { |n| n.odd? }
  end
  
  x.compare!
end

# Output shows iterations per second and comparison
# select: 12345.6 i/s
# reject: 12543.2 i/s - 1.02x faster

The rack-mini-profiler gem integrates performance profiling into web applications. The middleware displays query counts, execution times, and memory allocations for each request. Developers identify N+1 queries, slow database operations, and memory-intensive code paths through inline profiling results.

# In a Rails application
class ApplicationController < ActionController::Base
  before_action :authorize_profiler
  
  private
  
  def authorize_profiler
    if current_user&.admin?
      Rack::MiniProfiler.authorize_request
    end
  end
end

# Profiler shows SQL query performance
def index
  # This generates N+1 queries - profiler highlights the issue
  @users = User.all
  @users.each { |user| user.posts.count }
end

The ruby-prof gem provides detailed profiling with multiple output formats. Call stack profiling reveals which methods consume the most time, while allocation profiling identifies memory-intensive operations. The gem supports flat profiles, graph profiles, and call tree visualizations.

require 'ruby-prof'

RubyProf.start

# Code to profile
result = perform_complex_calculation(1000)

profile = RubyProf.stop

# Print flat profile to console
printer = RubyProf::FlatPrinter.new(profile)
printer.print(STDOUT, min_percent: 1)

# Generate call graph
printer = RubyProf::GraphPrinter.new(profile)
printer.print(File.open('profile.txt', 'w'))

# Create call tree for visualization
printer = RubyProf::CallTreePrinter.new(profile)
printer.print(File.open('callgrind.out', 'w'))

The memory_profiler gem tracks object allocations and memory retention. The gem reports which code locations allocate the most objects, which classes consume the most memory, and which strings occupy the most space. Memory profiling identifies memory leaks and excessive allocation patterns.

require 'memory_profiler'

report = MemoryProfiler.report do
  # Code to analyze
  1000.times do
    User.new(name: "Test User", email: "test@example.com")
  end
end

report.pretty_print(scale_bytes: true)

# Output shows:
# Total allocated: 156.25 KB (2000 objects)
# Total retained: 0 B (0 objects)
# allocated memory by gem
# allocated memory by file
# allocated memory by class

Practical Examples

Load testing a web application requires simulating concurrent users making realistic requests. The following example creates a simple load test that measures response times and throughput for a user registration endpoint.

require 'net/http'
require 'json'
require 'concurrent'

class LoadTester
  def initialize(url, concurrent_users: 10, duration: 60)
    @url = URI(url)
    @concurrent_users = concurrent_users
    @duration = duration
    @metrics = PerformanceMetrics.new
    @stop = false
  end

  def run
    threads = @concurrent_users.times.map do |i|
      Thread.new { simulate_user(i) }
    end

    sleep(@duration)
    @stop = true
    threads.each(&:join)

    @metrics.report
  end

  private

  def simulate_user(user_id)
    until @stop
      start_time = Time.now
      success = make_request(user_id)
      duration = Time.now - start_time
      @metrics.record_request(duration, success)
      
      sleep(rand(1..3)) # Think time
    end
  end

  def make_request(user_id)
    http = Net::HTTP.new(@url.host, @url.port)
    http.use_ssl = @url.scheme == 'https'
    
    request = Net::HTTP::Post.new(@url)
    request['Content-Type'] = 'application/json'
    request.body = {
      username: "user_#{user_id}_#{rand(10000)}",
      email: "user#{rand(10000)}@example.com",
      password: "password123"
    }.to_json

    response = http.request(request)
    response.code.to_i < 400
  rescue => e
    puts "Request failed: #{e.message}"
    false
  end
end

# Execute load test
tester = LoadTester.new(
  'https://api.example.com/users',
  concurrent_users: 50,
  duration: 120
)

results = tester.run
puts "Throughput: #{results[:throughput].round(2)} req/s"
puts "Error rate: #{(results[:error_rate] * 100).round(2)}%"
puts "Median response: #{(results[:median] * 1000).round(2)}ms"
puts "95th percentile: #{(results[:p95] * 1000).round(2)}ms"

Database query performance testing validates that queries execute efficiently under realistic data volumes. The following example measures query performance across different table sizes and index configurations.

require 'benchmark'
require 'active_record'

class QueryPerformanceTester
  def initialize(model_class)
    @model = model_class
  end

  def test_query_scaling
    record_counts = [100, 1_000, 10_000, 100_000]
    
    record_counts.each do |count|
      setup_test_data(count)
      
      results = Benchmark.measure do
        @model.where(status: 'active').limit(100).to_a
      end
      
      puts "Records: #{count}, Query time: #{results.real}s"
      cleanup_test_data
    end
  end

  def compare_index_impact
    setup_test_data(50_000)
    
    # Without index
    time_without_index = Benchmark.measure do
      @model.where(email: 'test@example.com').first
    end
    
    # Add index
    ActiveRecord::Migration.add_index @model.table_name, :email
    
    # With index
    time_with_index = Benchmark.measure do
      @model.where(email: 'test@example.com').first
    end
    
    puts "Without index: #{time_without_index.real}s"
    puts "With index: #{time_with_index.real}s"
    puts "Improvement: #{(time_without_index.real / time_with_index.real).round(2)}x"
    
    cleanup_test_data
    ActiveRecord::Migration.remove_index @model.table_name, :email
  end

  private

  def setup_test_data(count)
    @model.connection.execute("TRUNCATE #{@model.table_name}")
    
    count.times do |i|
      @model.create!(
        email: "user#{i}@example.com",
        status: i.even? ? 'active' : 'inactive',
        created_at: Time.now - rand(365).days
      )
    end
  end

  def cleanup_test_data
    @model.connection.execute("TRUNCATE #{@model.table_name}")
  end
end

Stress testing determines system breaking points by progressively increasing load until failures occur. This approach identifies maximum capacity and failure modes.

class StressTester
  def initialize(url)
    @url = url
    @results = []
  end

  def run
    user_counts = [10, 25, 50, 100, 200, 400, 800]
    
    user_counts.each do |count|
      puts "Testing with #{count} concurrent users..."
      
      tester = LoadTester.new(@url, concurrent_users: count, duration: 30)
      result = tester.run
      
      @results << {
        users: count,
        throughput: result[:throughput],
        error_rate: result[:error_rate],
        p95_latency: result[:p95]
      }
      
      # Stop if error rate exceeds threshold
      break if result[:error_rate] > 0.05
      
      sleep(10) # Recovery time between tests
    end
    
    analyze_results
  end

  private

  def analyze_results
    puts "\n=== Stress Test Results ==="
    @results.each do |r|
      puts "#{r[:users]} users: " \
           "#{r[:throughput].round(2)} req/s, " \
           "#{(r[:error_rate] * 100).round(2)}% errors, " \
           "#{(r[:p95_latency] * 1000).round(2)}ms p95"
    end
    
    max_capacity = @results.select { |r| r[:error_rate] < 0.01 }.last
    puts "\nMax capacity: ~#{max_capacity[:users]} concurrent users"
  end
end

Tools & Ecosystem

Apache Bench (ab) provides command-line HTTP load testing with straightforward request/response metrics. The tool measures requests per second, connection times, and response distributions across concurrent connections. While not Ruby-specific, ab integrates into Ruby-based performance testing workflows.

# Wrapper for Apache Bench
class ApacheBenchRunner
  def initialize(url)
    @url = url
  end

  def run(requests: 1000, concurrency: 10)
    cmd = "ab -n #{requests} -c #{concurrency} -g data.tsv #{@url}"
    output = `#{cmd}`
    
    parse_results(output)
  end

  private

  def parse_results(output)
    {
      requests_per_second: output[/Requests per second:\s+([\d.]+)/, 1].to_f,
      time_per_request: output[/Time per request:\s+([\d.]+)/, 1].to_f,
      transfer_rate: output[/Transfer rate:\s+([\d.]+)/, 1].to_f
    }
  end
end

runner = ApacheBenchRunner.new('http://localhost:3000/api/users')
results = runner.run(requests: 5000, concurrency: 50)
puts "RPS: #{results[:requests_per_second]}"

The wrk tool generates significant load with minimal resource consumption. Written in C with LuaJIT scripting support, wrk produces millions of requests per second from a single machine. Ruby applications integrate wrk through system calls or analyze wrk output files.

The siege tool simulates realistic user behavior with configurable delays and URLs. The tool reads URL lists from files, supports HTTP authentication, and measures transaction rates under varied request patterns. Siege provides concurrent user simulation without requiring programming.

The httperf tool focuses on HTTP server performance measurement with detailed connection and session metrics. The tool generates requests at specified rates, measures connection establishment times, and tracks reply rates. Httperf validates server capacity planning and configuration tuning.

JMeter provides GUI-based test design with distributed load generation capabilities. While Java-based, JMeter tests Ruby web applications through HTTP samplers. The tool supports complex scenarios with assertions, controllers, and listeners for result analysis.

The derailed_benchmarks gem identifies memory bloat and allocation issues in Ruby applications. The gem measures memory usage across requests, tracks retained objects, and generates allocation flamegraphs. Rails applications integrate derailed_benchmarks to detect memory-related performance problems.

# Using derailed_benchmarks in a Rails app
# Add to Gemfile
gem 'derailed_benchmarks', group: :development

# Measure memory usage per request
# bundle exec derailed bundle:mem

# Find memory usage per object
# bundle exec derailed bundle:objects

# Identify memory leaks
# TEST_COUNT=10000 bundle exec derailed exec perf:mem_over_time

The stackprof gem provides sampling profilers for CPU and object allocation analysis. The gem operates with minimal overhead, suitable for production profiling. Stackprof generates flamegraphs showing where applications spend execution time.

require 'stackprof'

StackProf.run(mode: :cpu, out: 'tmp/stackprof-cpu.dump') do
  # Application code to profile
  process_large_dataset
end

# Generate report
# stackprof tmp/stackprof-cpu.dump --text --limit 20

# Create flamegraph
# stackprof tmp/stackprof-cpu.dump --flamegraph > tmp/flamegraph.txt
# flamegraph.pl tmp/flamegraph.txt > tmp/flamegraph.svg

The benchmark-memory gem measures memory allocations rather than execution time. The gem counts allocated objects and retained objects, revealing memory efficiency across implementations.

require 'benchmark/memory'

Benchmark.memory do |x|
  x.report("Array#map") do
    (1..1000).to_a.map { |n| n * 2 }
  end
  
  x.report("Array#each") do
    result = []
    (1..1000).to_a.each { |n| result << n * 2 }
    result
  end
  
  x.compare!
end

Implementation Approaches

Bottom-up performance testing begins with unit-level benchmarks, progressing through component integration tests to full system load tests. This approach establishes performance baselines at each architectural layer, simplifying bottleneck identification when system-level tests reveal issues.

Component-level benchmarks measure individual service or module performance in isolation. Database query performance, cache operations, API client latency, and background job processing times receive isolated measurement. Component benchmarks execute quickly in CI/CD pipelines, catching performance regressions before integration.

Integration performance testing validates performance characteristics when components interact. Service-to-service communication overhead, database connection pooling behavior, and distributed transaction coordination receive measurement at integration boundaries. Integration tests reveal performance issues invisible in isolated component tests.

System-level load testing evaluates the complete application under realistic user loads. Full request paths exercise all components, revealing cumulative latency, resource contention, and scaling limitations. System tests validate that architectural assumptions hold under actual operating conditions.

Top-down performance testing starts with production monitoring and works backward to identify bottlenecks. Application performance monitoring captures real user experience metrics, highlighting problematic endpoints or operations. Targeted performance tests then reproduce and investigate specific issues.

Synthetic monitoring generates artificial traffic against production or staging environments, measuring availability and performance continuously. Synthetic tests execute critical user journeys on fixed schedules, detecting performance degradation before users encounter problems. Alert thresholds trigger investigation when performance metrics exceed acceptable ranges.

Production profiling captures performance data from live systems with minimal overhead. Sampling profilers activate periodically, collecting stack traces and resource utilization metrics. Statistical analysis reveals hot code paths and resource-intensive operations from real production traffic patterns.

Chaos engineering introduces controlled failures to validate system resilience and performance degradation characteristics. Deliberately terminating services, introducing network latency, or exhausting resources reveals how systems behave under adverse conditions. Performance tests combined with chaos experiments validate graceful degradation and recovery mechanisms.

# Chaos engineering example
class ChaosExperiment
  def initialize(test_runner)
    @runner = test_runner
  end

  def run_with_database_latency
    # Baseline performance
    baseline = @runner.run
    
    # Introduce 100ms database latency
    inject_latency('database', 100)
    degraded = @runner.run
    
    remove_latency('database')
    
    analyze_impact(baseline, degraded)
  end

  private

  def inject_latency(service, ms)
    # Configure network delay or proxy injection
    `tc qdisc add dev eth0 root netem delay #{ms}ms`
  end

  def remove_latency(service)
    `tc qdisc del dev eth0 root`
  end

  def analyze_impact(baseline, degraded)
    throughput_impact = (1 - degraded[:throughput] / baseline[:throughput]) * 100
    latency_impact = (degraded[:p95] / baseline[:p95] - 1) * 100
    
    puts "Throughput degradation: #{throughput_impact.round(2)}%"
    puts "Latency increase: #{latency_impact.round(2)}%"
  end
end

Common Patterns

Ramp-up testing gradually increases load to identify when performance degrades. Starting with minimal load, tests incrementally add concurrent users or request rates while monitoring response times and error rates. Ramp-up patterns prevent overwhelming systems at test start and reveal capacity transitions.

class RampUpTest
  def initialize(url, max_users: 100, step: 10, duration_per_step: 60)
    @url = url
    @max_users = max_users
    @step = step
    @duration = duration_per_step
  end

  def run
    current_users = @step
    results = []
    
    while current_users <= @max_users
      puts "Testing #{current_users} users..."
      tester = LoadTester.new(@url, 
                               concurrent_users: current_users,
                               duration: @duration)
      results << tester.run.merge(users: current_users)
      
      current_users += @step
      sleep(10)
    end
    
    plot_results(results)
  end

  private

  def plot_results(results)
    results.each do |r|
      puts "#{r[:users]} users: #{r[:throughput].round(2)} req/s, " \
           "#{(r[:p95] * 1000).round(2)}ms p95"
    end
  end
end

Think time injection simulates realistic user behavior by introducing delays between requests. Users read content, fill forms, and make decisions before subsequent actions. Performance tests without think time generate unrealistic continuous request streams that overestimate system capacity.

Percentile-based analysis provides accurate performance characterization beyond simple averages. The 95th and 99th percentile response times reveal tail latency affecting real users, while averages hide performance outliers. Service level objectives typically define acceptable percentile values rather than means.

Warmup periods allow systems to reach steady state before measurement begins. JIT compilation, cache population, connection pool establishment, and resource allocation occur during warmup. Excluding warmup from measurements prevents artificial inflation of initial response times.

class WarmupTest
  def initialize(url, warmup_requests: 100, test_requests: 1000)
    @url = url
    @warmup_requests = warmup_requests
    @test_requests = test_requests
  end

  def run
    puts "Warming up..."
    warmup_phase
    
    puts "Starting measurement..."
    test_phase
  end

  private

  def warmup_phase
    @warmup_requests.times do
      make_request
    end
  end

  def test_phase
    metrics = PerformanceMetrics.new
    
    @test_requests.times do
      start_time = Time.now
      success = make_request
      duration = Time.now - start_time
      metrics.record_request(duration, success)
    end
    
    puts metrics.report
  end

  def make_request
    # HTTP request implementation
  end
end

Resource saturation testing identifies bottleneck resources by monitoring CPU, memory, disk I/O, and network utilization during load tests. Understanding which resource reaches capacity first guides optimization efforts and infrastructure scaling decisions.

Endurance testing maintains sustained load over extended periods, revealing issues invisible in short tests. Memory leaks, connection pool exhaustion, log file growth, and degradation from resource fragmentation emerge during multi-hour or multi-day test runs.

Comparative benchmarking measures performance across implementations, configurations, or infrastructure changes. A/B testing different caching strategies, database configurations, or deployment architectures requires controlled comparison with identical load patterns. Statistical significance testing validates that observed differences exceed measurement noise.

Reference

Performance Test Types

Test Type	Purpose	Duration	Load Pattern
Load Test	Verify expected performance	Minutes to hours	Constant expected load
Stress Test	Find breaking point	Progressive increase	Increasing until failure
Soak Test	Detect memory leaks	Hours to days	Sustained constant load
Spike Test	Validate elasticity	Minutes	Sudden load increase
Scalability Test	Measure scaling efficiency	Varies	Incremental resource addition
Volume Test	Test with large datasets	Varies	Large data volumes

Key Performance Metrics

Metric	Description	Typical Target
Response Time	Request completion duration	Under 200ms for web pages
Throughput	Requests processed per second	Application-specific
Error Rate	Failed requests percentage	Under 0.1%
Concurrent Users	Simultaneous active sessions	Based on capacity planning
CPU Utilization	Processor usage percentage	Under 70% at peak
Memory Usage	RAM consumption	Stable over time
P95 Latency	95th percentile response time	2-3x median acceptable
P99 Latency	99th percentile response time	5-10x median acceptable

Ruby Profiling Tools

Tool	Focus	Output Format	Use Case
Benchmark	Execution time	Console text	Quick timing comparisons
benchmark-ips	Iterations per second	Console text	Implementation comparisons
ruby-prof	CPU and memory profiling	Multiple formats	Detailed profiling
memory_profiler	Object allocations	Console report	Memory optimization
stackprof	Sampling profiler	Flamegraphs	Production profiling
rack-mini-profiler	Request profiling	Inline web UI	Development profiling
derailed_benchmarks	Memory bloat	Console reports	Rails memory issues

HTTP Load Testing Tools

Tool	Language	Concurrency Model	Strengths
Apache Bench	C	Multi-process	Simple, widely available
wrk	C + Lua	Event-driven	High throughput, scriptable
siege	C	Multi-threaded	Realistic user simulation
httperf	C	Event-driven	Detailed metrics
JMeter	Java	Multi-threaded	GUI, distributed testing

Statistical Measures

Measure	Calculation	Significance
Mean	Sum / count	Overall average performance
Median	Middle value when sorted	Typical user experience
Mode	Most frequent value	Common case performance
Standard Deviation	Spread from mean	Performance consistency
Percentiles	Value below which X% fall	Tail latency characterization
Variance	Square of standard deviation	Variability quantification

Resource Monitoring

Resource	Metric	Command
CPU	Usage percentage	top, htop, mpstat
Memory	Used/available RAM	free, vmstat
Disk I/O	Read/write rates	iostat, iotop
Network	Bandwidth utilization	iftop, nethogs
Database	Connection count, query time	SHOW PROCESSLIST
Cache	Hit rate	Redis INFO, Memcached stats

Test Environment Configuration

Component	Requirement	Rationale
Hardware	Match production specs	CPU/memory affect performance
Network	Equivalent latency/bandwidth	Network impacts distributed systems
Data Volume	Production-scale datasets	Query plans change with data size
Configuration	Production settings	Different configs affect performance
Dependencies	Same versions as production	Version differences affect behavior
Operating System	Match production OS	System calls vary by platform

Performance Testing