Overview
Performance metrics quantify software behavior through measurable data points that indicate system efficiency, resource consumption, and responsiveness. These measurements transform subjective assessments like "the application feels slow" into objective, actionable data such as "p95 response time exceeds 500ms."
Metrics serve multiple purposes across the development lifecycle. During development, metrics identify inefficient algorithms or excessive resource consumption. In testing environments, metrics establish performance baselines and detect regressions. Production metrics enable capacity planning, incident response, and continuous optimization.
The field distinguishes between several metric categories. Latency metrics measure time-based operations: response time, processing duration, queue wait time. Throughput metrics quantify work volume: requests per second, transactions completed, messages processed. Resource metrics track consumption: CPU utilization, memory allocation, disk I/O, network bandwidth. Error metrics monitor failures: error rates, timeout frequency, retry counts.
# Simple timing measurement
start_time = Time.now
result = expensive_operation
elapsed = Time.now - start_time
puts "Operation completed in #{elapsed} seconds"
# => Operation completed in 0.143 seconds
Performance measurement differs from profiling. Metrics collect aggregate statistics about system behavior, while profiling captures detailed execution traces. Metrics answer "how fast" and "how much"; profiling answers "where" and "why."
Key Principles
Effective performance metrics adhere to specific characteristics. Actionability means metrics directly inform optimization decisions. A metric showing "average response time: 200ms" provides less value than "p95 response time: 800ms; p99: 2.1s" which immediately highlights tail latency issues requiring attention.
Granularity determines measurement precision. Coarse-grained metrics like "application throughput" obscure localized problems. Fine-grained metrics isolating specific controllers, database queries, or external API calls pinpoint bottlenecks. The appropriate granularity balances observability needs against collection overhead.
# Coarse-grained measurement
def process_request
start = Time.now
# Multiple operations
Time.now - start
end
# Fine-grained measurement
def process_request
metrics = {}
metrics[:auth] = measure { authenticate_user }
metrics[:query] = measure { fetch_data }
metrics[:render] = measure { render_response }
metrics
end
def measure
start = Time.now
yield
Time.now - start
end
Statistical rigor prevents misleading conclusions. Averages conceal distribution characteristics. A service averaging 100ms might deliver most requests in 50ms while 5% exceed 1000ms. Percentiles reveal this distribution: p50 (median), p95 (95th percentile), p99, p999. The p95 metric means 95% of measurements fall below this value, identifying the threshold for typical operations while exposing tail latency.
Baseline establishment provides comparison context. Metrics without baselines cannot indicate whether 200ms represents good or poor performance. Baseline establishment involves measuring stable-state behavior across representative workloads, capturing not just central tendency but variation patterns and periodic fluctuations.
Measurement overhead affects accuracy. Every measurement consumes resources—time for clock reads, memory for data structures, CPU for calculations. High-frequency measurement in hot code paths can degrade the performance being measured, creating observer effects where measurement alters system behavior.
Aggregation strategies compress raw measurements into manageable summaries. Time-series databases store metric values with timestamps. Common aggregation approaches include:
Histogram bucketing groups measurements into ranges, preserving distribution shape while reducing storage. A latency histogram might track counts in buckets: 0-10ms, 10-50ms, 50-100ms, 100-500ms, 500ms+.
Windowed aggregation calculates statistics over time periods. A one-minute window computes average, minimum, maximum, and percentiles for measurements within that window, then resets. Sliding windows overlap, providing smooth transitions.
Exponential decay weights recent measurements more heavily than historical ones. This approach balances responsiveness to changes against stability, preventing old data from obscuring current conditions.
Ruby Implementation
Ruby provides multiple facilities for performance measurement, from standard library modules to specialized gems. The Benchmark module offers basic timing capabilities for comparing code alternatives.
require 'benchmark'
n = 100_000
Benchmark.bmbm do |x|
x.report("map:") { n.times { (1..100).map { |i| i * 2 } } }
x.report("each:") { n.times { result = []; (1..100).each { |i| result << i * 2 } } }
end
# Rehearsal --------------------------------------------
# map: 0.234000 0.001000 0.235000 ( 0.237123)
# each: 0.289000 0.002000 0.291000 ( 0.293456)
# ----------------------------------- total: 0.526000sec
The benchmark-ips gem measures iterations per second rather than total execution time, providing more intuitive comparison metrics. This approach minimizes noise from GC pauses and system variation by running code until statistical confidence thresholds are met.
require 'benchmark/ips'
Benchmark.ips do |x|
x.config(time: 5, warmup: 2)
x.report("String#+") do
str = "hello"
1000.times { str = str + " world" }
end
x.report("String#<<") do
str = "hello"
1000.times { str << " world" }
end
x.compare!
end
# Warming up --------------------------------------
# String#+ 2.156k i/100ms
# String#<< 24.567k i/100ms
# Calculating -------------------------------------
# String#+ 21.432k (± 2.1%) i/s - 108.k in 5.041284s
# String#<< 245.789k (± 1.8%) i/s - 1.2M in 5.012345s
Memory measurement identifies allocation patterns and potential leaks. The memory_profiler gem tracks object allocations by class, gem, file, and location.
require 'memory_profiler'
report = MemoryProfiler.report do
10_000.times do
User.new(name: "John", email: "john@example.com")
end
end
report.pretty_print
# Total allocated: 890 KB (20000 objects)
# Total retained: 0 KB (0 objects)
#
# allocated memory by class
# -----------------------------------
# 450 KB String
# 200 KB Hash
# 150 KB User
Ruby's garbage collector provides statistics through GC.stat, exposing metrics about collection frequency, duration, and heap characteristics.
before = GC.stat
result = expensive_operation
after = GC.stat
gc_count = after[:count] - before[:count]
puts "Triggered #{gc_count} garbage collections"
puts "Heap size: #{after[:heap_live_slots]} live objects"
Process-level metrics access system resource usage. Ruby's Process module exposes CPU time, memory consumption, and other kernel statistics.
def measure_resources
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
start_cpu = Process.times
yield
elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time
cpu_time = Process.times.utime - start_cpu.utime
{
wall_time: elapsed,
cpu_time: cpu_time,
cpu_percent: (cpu_time / elapsed * 100).round(2)
}
end
stats = measure_resources { heavy_computation }
puts "Wall time: #{stats[:wall_time]}s (#{stats[:cpu_percent]}% CPU)"
The stackprof gem provides sampling-based profiling with minimal overhead, suitable for production environments. It captures stack traces at regular intervals to identify hot code paths.
require 'stackprof'
StackProf.run(mode: :wall, out: 'tmp/stackprof.dump') do
process_requests
end
# Generate report
system('stackprof tmp/stackprof.dump --text')
Practical Examples
Measuring HTTP endpoint performance requires capturing multiple metrics: response time, database query count, memory allocation, and cache hit rates. This example instruments a Rails controller action.
class MetricsMiddleware
def initialize(app)
@app = app
end
def call(env)
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
start_allocations = GC.stat[:total_allocated_objects]
db_queries = 0
subscription = ActiveSupport::Notifications.subscribe('sql.active_record') do
db_queries += 1
end
status, headers, response = @app.call(env)
elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time
allocations = GC.stat[:total_allocated_objects] - start_allocations
ActiveSupport::Notifications.unsubscribe(subscription)
MetricsCollector.record(
endpoint: env['PATH_INFO'],
duration: elapsed,
db_queries: db_queries,
allocations: allocations,
status: status
)
[status, headers, response]
end
end
Background job monitoring tracks processing time, retry frequency, and failure patterns. This example wraps Sidekiq job execution with comprehensive metrics.
class JobMetricsMiddleware
def call(worker, job, queue)
start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
success = false
begin
yield
success = true
rescue => error
MetricsCollector.increment(
"job.error.#{worker.class.name}",
tags: { error_class: error.class.name }
)
raise
ensure
duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start
MetricsCollector.histogram(
"job.duration.#{worker.class.name}",
duration,
tags: { queue: queue, success: success }
)
MetricsCollector.increment(
"job.processed.#{worker.class.name}",
tags: { queue: queue, success: success }
)
end
end
end
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.add JobMetricsMiddleware
end
end
Database query performance analysis identifies slow queries and N+1 patterns. This collector integrates with ActiveRecord's instrumentation.
class QueryMetricsCollector
def self.install
ActiveSupport::Notifications.subscribe('sql.active_record') do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
analyze_query(event)
end
end
def self.analyze_query(event)
return if event.payload[:name] == 'SCHEMA'
duration_ms = event.duration
sql = event.payload[:sql]
# Extract table name
table = sql[/FROM\s+`?(\w+)`?/i, 1] || 'unknown'
MetricsCollector.histogram(
'db.query.duration',
duration_ms,
tags: { table: table, slow: duration_ms > 100 }
)
if duration_ms > 1000
Rails.logger.warn "Slow query (#{duration_ms}ms): #{sql}"
end
# Detect N+1 queries
caller_location = caller.find { |line| line.include?('app/') }
QueryTracker.record(table, caller_location)
end
end
External API call tracking monitors third-party service latency and availability. This example wraps HTTP requests with timeout and retry metrics.
class APIMetrics
def self.track(service_name)
start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
attempt = 0
begin
attempt += 1
result = yield
duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start
MetricsCollector.histogram(
'external_api.duration',
duration,
tags: { service: service_name, attempt: attempt, success: true }
)
result
rescue Net::ReadTimeout, Net::OpenTimeout => error
duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start
MetricsCollector.histogram(
'external_api.duration',
duration,
tags: { service: service_name, attempt: attempt, success: false }
)
MetricsCollector.increment(
'external_api.timeout',
tags: { service: service_name, error_type: error.class.name }
)
raise
end
end
end
# Usage
APIMetrics.track('payment_gateway') do
PaymentGateway.charge(amount: 100, card: card_token)
end
Memory leak detection compares heap growth patterns over time. This monitor tracks object counts per class to identify retention issues.
class MemoryLeakDetector
def initialize(interval_seconds: 300)
@interval = interval_seconds
@baseline = nil
end
def start
Thread.new do
loop do
sample_memory
sleep @interval
end
end
end
def sample_memory
GC.start(full_mark: true, immediate_sweep: true)
current = ObjectSpace.count_objects
if @baseline
current.each do |klass, count|
baseline_count = @baseline[klass] || 0
growth = count - baseline_count
growth_percent = (growth.to_f / baseline_count * 100).round(2)
if growth_percent > 50 && count > 1000
MetricsCollector.gauge(
'memory.object_growth',
growth,
tags: { class: klass, growth_percent: growth_percent }
)
Rails.logger.warn "Possible leak: #{klass} grew #{growth_percent}%"
end
end
end
@baseline = current
end
end
Implementation Approaches
Push-based collection sends metrics from applications to collection endpoints immediately when events occur. Applications actively transmit data points to aggregation services via HTTP, UDP, or message queues. This approach provides real-time visibility but increases network traffic and requires handling transmission failures.
class PushMetricsCollector
def initialize(endpoint)
@endpoint = endpoint
@buffer = []
@mutex = Mutex.new
start_flush_thread
end
def record(metric_name, value, tags = {})
data_point = {
metric: metric_name,
value: value,
tags: tags,
timestamp: Time.now.to_i
}
@mutex.synchronize { @buffer << data_point }
end
private
def start_flush_thread
Thread.new do
loop do
sleep 10
flush_buffer
end
end
end
def flush_buffer
batch = @mutex.synchronize do
data = @buffer.dup
@buffer.clear
data
end
return if batch.empty?
Net::HTTP.post_form(URI(@endpoint), metrics: batch.to_json)
rescue => error
Rails.logger.error "Failed to send metrics: #{error}"
end
end
Pull-based collection exposes metrics through endpoints that monitoring systems scrape at regular intervals. Applications maintain metric state internally, and collectors query this state periodically. This approach reduces application complexity and allows collectors to control sampling rates but introduces lag between metric updates and collection.
Sampling strategies reduce collection overhead by measuring subsets of operations. Deterministic sampling measures every Nth request, providing consistent coverage but potentially missing patterns. Probabilistic sampling measures operations randomly based on configured probability, distributing measurement load evenly. Adaptive sampling adjusts rates based on system load or metric variance, measuring more during high-variation periods.
Aggregation timing determines when raw measurements combine into summary statistics. Real-time aggregation computes statistics as measurements arrive, maintaining running totals and quantile estimators. This approach minimizes memory usage but requires complex data structures for percentile calculation. Batch aggregation collects raw measurements in memory, computing statistics periodically. This simplifies calculations and enables precise percentiles but consumes more memory.
Storage strategies balance retention requirements against space constraints. Time-series databases optimize for metric data characteristics, storing timestamps and values efficiently. Downsampling reduces resolution for historical data, retaining high-fidelity recent metrics while summarizing older data. A common pattern: retain one-second granularity for one day, one-minute granularity for one week, one-hour granularity for one month, one-day granularity for one year.
Hierarchical aggregation combines metrics at multiple system levels. Individual process metrics aggregate into service-level metrics; service metrics aggregate into system-level metrics. This hierarchy enables both detailed troubleshooting and high-level monitoring. Tag-based aggregation groups metrics by arbitrary dimensions: endpoint, customer, data center, version. Tags enable flexible querying but increase cardinality and storage requirements.
Tools & Ecosystem
The benchmark module in Ruby's standard library provides comparison testing through the bmbm method, which runs code twice—first for rehearsal to minimize GC impact, then for actual measurement.
The benchmark-ips gem focuses on throughput measurement, reporting iterations per second with statistical confidence intervals. Configuration options control warmup duration, measurement duration, and comparison formatting.
The memory_profiler gem identifies allocation hot spots by tracking object creation sites. Reports break down allocations by class, gem, file, and method, distinguishing between allocated objects and retained objects that survive garbage collection.
The stackprof gem samples call stacks at configurable intervals, generating flamegraphs and text reports showing CPU-intensive code paths. Sampling modes include :wall (wall-clock time), :cpu (CPU time), and :object (object allocations).
The ruby-prof gem provides deterministic profiling, measuring exact call counts and cumulative time. Higher overhead than sampling approaches but offers complete accuracy for small workloads.
The derailed_benchmarks gem specializes in Rails application benchmarking, measuring memory usage, object allocations, and request performance. Integration with stackprof and memory_profiler enables detailed analysis.
# Rakefile
require 'derailed_benchmarks'
require 'derailed_benchmarks/tasks'
# Run with: bundle exec derailed bundle:mem
# or: bundle exec derailed exec perf:mem
Application Performance Monitoring (APM) solutions provide production-grade metric collection and analysis. New Relic instruments Ruby applications automatically, tracking transactions, database queries, and external calls. Datadog offers infrastructure monitoring alongside APM, correlating application metrics with system resources. Scout APM focuses on Rails applications with minimal overhead, highlighting N+1 queries and memory bloat.
The prometheus-client gem exposes metrics in Prometheus format via HTTP endpoints. Prometheus scrapes these endpoints periodically, storing time-series data and enabling PromQL queries.
require 'prometheus/client'
# Create registry
registry = Prometheus::Client.registry
# Define metrics
http_requests = Prometheus::Client::Counter.new(
:http_requests_total,
docstring: 'Total HTTP requests',
labels: [:method, :path, :status]
)
registry.register(http_requests)
# Increment counter
http_requests.increment(labels: { method: 'GET', path: '/users', status: 200 })
# Expose metrics
use Prometheus::Client::Rack::Exporter, registry: registry
StatsD provides a simple protocol for metric aggregation. Applications send metrics via UDP to a StatsD daemon, which aggregates and forwards to time-series databases. The statsd-ruby gem offers Ruby integration.
require 'statsd'
statsd = Statsd.new('localhost', 8125)
# Counter
statsd.increment('page.views')
# Gauge
statsd.gauge('queue.size', 247)
# Histogram
statsd.histogram('api.response_time', 156)
# Timing
statsd.time('db.query') do
User.where(active: true).count
end
Real-World Applications
Production monitoring systems collect metrics continuously, alerting on anomalies and trends. Metrics feed into dashboards showing system health, capacity utilization, and business KPIs. Alert thresholds trigger notifications when metrics exceed acceptable ranges.
A typical Rails production setup instruments multiple layers:
# config/initializers/metrics.rb
class ApplicationMetrics
def self.setup
# Request metrics
ActiveSupport::Notifications.subscribe('process_action.action_controller') do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
statsd.histogram('http.response_time', event.duration, tags: {
controller: event.payload[:controller],
action: event.payload[:action],
status: event.payload[:status]
})
end
# Database metrics
ActiveSupport::Notifications.subscribe('sql.active_record') do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
next if event.payload[:name] == 'SCHEMA'
statsd.histogram('db.query_time', event.duration)
statsd.increment('db.query_count')
end
# Background job metrics
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.add JobMetricsMiddleware
end
end
# System metrics
Thread.new do
loop do
memory_mb = `ps -o rss= -p #{Process.pid}`.to_i / 1024
statsd.gauge('process.memory_mb', memory_mb)
gc_stats = GC.stat
statsd.gauge('ruby.heap_live_slots', gc_stats[:heap_live_slots])
statsd.counter('ruby.gc_count', gc_stats[:count])
sleep 60
end
end
end
def self.statsd
@statsd ||= Statsd.new(ENV['STATSD_HOST'], 8125)
end
end
Performance regression testing incorporates metrics into CI/CD pipelines. Automated tests measure execution time and resource usage, failing builds when metrics degrade beyond thresholds. This prevents performance regressions from reaching production.
# spec/performance/checkout_spec.rb
RSpec.describe 'Checkout performance', type: :performance do
it 'completes checkout within threshold' do
user = create(:user)
cart = create(:cart, user: user, items_count: 5)
result = Benchmark.measure do
CheckoutService.new(cart).process
end
expect(result.real).to be < 2.0, "Checkout took #{result.real}s (threshold: 2.0s)"
end
it 'executes reasonable number of queries' do
user = create(:user)
cart = create(:cart, user: user, items_count: 5)
query_count = 0
ActiveSupport::Notifications.subscribe('sql.active_record') do
query_count += 1
end
CheckoutService.new(cart).process
expect(query_count).to be <= 10, "Checkout executed #{query_count} queries (threshold: 10)"
end
end
Capacity planning uses historical metric trends to forecast resource requirements. Analysis of throughput, latency, and resource utilization under various load levels informs scaling decisions. Metric correlation identifies bottlenecks—if CPU utilization remains low while latency increases, database or external services likely constrain performance.
SLA compliance monitoring tracks metrics against service level objectives. A 99.9% availability SLA tolerates 43 minutes downtime monthly. Error rate metrics, calculated as failed requests divided by total requests, indicate whether services meet reliability targets. Latency percentiles verify response time commitments.
Cost optimization leverages metrics to identify inefficiencies. Database query metrics reveal expensive operations that indexing could accelerate. Memory allocation metrics highlight object retention issues causing excessive GC. External API metrics show redundant calls that caching could eliminate.
class CostOptimizationAnalyzer
def analyze_period(start_date, end_date)
# Identify expensive database queries
slow_queries = MetricsDB.query(
"SELECT sql, AVG(duration) as avg_duration, COUNT(*) as call_count
FROM query_metrics
WHERE timestamp BETWEEN ? AND ?
GROUP BY sql
HAVING avg_duration > 100
ORDER BY avg_duration * call_count DESC",
start_date, end_date
)
# Calculate potential savings from caching
cacheable_api_calls = MetricsDB.query(
"SELECT endpoint, COUNT(*) as call_count, AVG(duration) as avg_duration
FROM api_metrics
WHERE timestamp BETWEEN ? AND ?
GROUP BY endpoint
HAVING call_count > 1000
ORDER BY call_count DESC",
start_date, end_date
)
# Find memory allocation hot spots
allocation_sources = MetricsDB.query(
"SELECT location, SUM(allocations) as total_allocations
FROM memory_metrics
WHERE timestamp BETWEEN ? AND ?
GROUP BY location
ORDER BY total_allocations DESC
LIMIT 20",
start_date, end_date
)
{
slow_queries: slow_queries,
cacheable_endpoints: cacheable_api_calls,
allocation_hotspots: allocation_sources
}
end
end
Reference
Metric Categories
| Category | Examples | Use Case |
|---|---|---|
| Latency | Response time, processing duration, queue wait | Measure time-based operations |
| Throughput | Requests per second, transactions per minute | Quantify work volume |
| Resource | CPU utilization, memory usage, disk I/O | Track consumption |
| Error | Error rate, timeout count, retry frequency | Monitor failures |
| Business | Sign-ups, purchases, active users | Track KPIs |
Statistical Measures
| Measure | Calculation | Interpretation |
|---|---|---|
| Mean | Sum divided by count | Average value, skewed by outliers |
| Median | Middle value (p50) | Typical experience, resistant to outliers |
| p95 | 95th percentile | Threshold for most operations |
| p99 | 99th percentile | Near-worst case, excludes extreme outliers |
| Standard deviation | Square root of variance | Spread around mean |
| Coefficient of variation | Std dev divided by mean | Relative variability |
Ruby Benchmark Methods
| Method | Purpose | Returns |
|---|---|---|
| Benchmark.measure | Time single execution | Benchmark::Tms object |
| Benchmark.bm | Compare multiple implementations | Array of Benchmark::Tms |
| Benchmark.bmbm | Rehearsal run before measurement | Array of Benchmark::Tms |
| Benchmark.realtime | Wall-clock time only | Float seconds |
GC Statistics
| Metric | GC.stat Key | Meaning |
|---|---|---|
| Collection count | :count | Total GC runs since start |
| Live objects | :heap_live_slots | Objects surviving collection |
| Free slots | :heap_free_slots | Available object slots |
| Total allocated | :total_allocated_objects | Cumulative allocations |
| Major GC count | :major_gc_count | Full mark-and-sweep collections |
| Minor GC count | :minor_gc_count | Quick young generation collections |
Prometheus Metric Types
| Type | Purpose | Ruby Method |
|---|---|---|
| Counter | Monotonically increasing value | increment, increment_by |
| Gauge | Arbitrary value that goes up/down | set, increment, decrement |
| Histogram | Sample observations in buckets | observe |
| Summary | Sample observations with quantiles | observe |
Common Thresholds
| Metric | Threshold | Impact |
|---|---|---|
| Web response time p95 | 500ms | User perceives slowness |
| API response time p95 | 200ms | Client timeouts increase |
| Database query p95 | 100ms | Request queuing begins |
| Memory growth | 10% per hour | Possible leak |
| Error rate | 0.1% | SLA violation risk |
| CPU utilization | 70% sustained | Capacity constraint |
Time Measurement Precision
| Method | Precision | Use Case |
|---|---|---|
| Time.now | Microseconds | General timing |
| Process.clock_gettime(MONOTONIC) | Nanoseconds | Accurate intervals |
| Process.clock_gettime(REALTIME) | Nanoseconds | Wall-clock timestamps |
| Process.times | Centiseconds | CPU time tracking |
Percentile Calculation
For sorted array of N values:
- p50 index: N * 0.50
- p95 index: N * 0.95
- p99 index: N * 0.99
Interpolate between adjacent values when index is not integer.
Aggregation Window Sizes
| Window | Granularity | Retention | Storage per metric |
|---|---|---|---|
| Real-time | 1 second | 1 day | 86,400 points |
| Short-term | 1 minute | 7 days | 10,080 points |
| Medium-term | 5 minutes | 30 days | 8,640 points |
| Long-term | 1 hour | 1 year | 8,760 points |