CrackedRuby - Capacity Planning

Overview

Capacity planning determines the infrastructure resources required to meet application performance goals under anticipated load conditions. The process involves measuring current resource usage, forecasting future demand, identifying bottlenecks, and allocating resources to maintain acceptable response times and throughput.

Software systems consume computational resources including CPU cycles, memory, disk I/O, network bandwidth, and database connections. Capacity planning quantifies these resource requirements across different load scenarios to prevent performance degradation, service outages, and resource exhaustion. The practice applies to single servers, distributed systems, databases, message queues, caches, and all infrastructure components.

The planning process follows a cycle: baseline current performance, model resource consumption patterns, project future growth, provision infrastructure, monitor actual usage, and refine predictions. This cycle repeats as application behavior changes through feature additions, traffic growth, or architectural modifications.

Capacity planning differs from performance optimization. Performance optimization reduces resource consumption for a given workload, while capacity planning ensures sufficient resources exist for expected workloads. Both practices complement each other in maintaining system reliability.

# Capacity planning tracks resource usage over time
class CapacityMetrics
  def initialize
    @snapshots = []
  end
  
  def record_snapshot
    @snapshots << {
      timestamp: Time.now,
      cpu_percent: cpu_usage,
      memory_mb: memory_usage,
      active_connections: connection_count
    }
  end
  
  def forecast_exhaustion(resource, threshold)
    return nil if @snapshots.size < 2
    
    trend = calculate_trend(@snapshots, resource)
    current = @snapshots.last[resource]
    
    time_to_threshold = (threshold - current) / trend
    Time.now + time_to_threshold
  end
end

Organizations implement capacity planning at multiple scales. Small applications might track CPU and memory on a single server. Large distributed systems require planning across hundreds of services, each with distinct resource profiles. Cloud environments add complexity through auto-scaling, spot instances, and variable performance characteristics.

Key Principles

Resource Consumption Modeling establishes the relationship between workload and resource usage. Models range from simple linear relationships to complex multi-variable regressions. A linear model might express CPU usage as a function of requests per second, while sophisticated models account for request type, payload size, cache hit rates, and concurrent operations.

# Linear capacity model
class LinearCapacityModel
  def initialize(baseline_load, baseline_cpu)
    @baseline_load = baseline_load.to_f
    @baseline_cpu = baseline_cpu.to_f
  end
  
  def predict_cpu(load)
    (@baseline_cpu / @baseline_load) * load
  end
  
  def max_load_for_cpu(cpu_limit)
    (@baseline_load * cpu_limit) / @baseline_cpu
  end
end

model = LinearCapacityModel.new(1000, 45.0)
model.predict_cpu(2500)  # => 112.5% CPU at 2500 req/s
model.max_load_for_cpu(80.0)  # => 1777.78 req/s at 80% CPU

Headroom Management maintains buffer capacity above anticipated peak load. Headroom accounts for traffic spikes, measurement errors, and unexpected load patterns. Common practice reserves 20-50% headroom depending on traffic volatility and scaling agility. Systems with rapid auto-scaling capability require less headroom than manually-scaled infrastructure.

Growth Projection extrapolates future resource requirements from historical trends. Simple projections use linear extrapolation. Sophisticated approaches model seasonal patterns, cyclical behavior, and exponential growth curves. Projection accuracy improves with longer historical datasets and accounts for known future events like marketing campaigns or feature launches.

class GrowthProjector
  def initialize(historical_data)
    @data = historical_data.sort_by { |point| point[:timestamp] }
  end
  
  def linear_projection(months_ahead)
    return nil if @data.size < 2
    
    x_values = @data.map.with_index { |_, i| i }
    y_values = @data.map { |point| point[:value] }
    
    slope = calculate_slope(x_values, y_values)
    intercept = calculate_intercept(x_values, y_values, slope)
    
    future_index = @data.size + (months_ahead * 30)
    slope * future_index + intercept
  end
  
  def exponential_projection(months_ahead)
    return nil if @data.size < 3
    
    log_values = @data.map { |point| Math.log(point[:value]) }
    growth_rate = calculate_compound_growth_rate(log_values)
    
    current_value = @data.last[:value]
    current_value * ((1 + growth_rate) ** months_ahead)
  end
end

Bottleneck Identification locates the resource that limits system capacity. The bottleneck determines maximum achievable throughput regardless of other resource availability. Common bottlenecks include database connection pools, CPU saturation, memory exhaustion, disk I/O limits, and network bandwidth constraints. Capacity planning prioritizes bottleneck resources.

Workload Characterization categorizes requests by resource consumption profile. Not all requests consume identical resources. Read operations differ from writes, cached requests differ from cache misses, and background jobs differ from API requests. Accurate capacity planning requires understanding the mix of workload types and their relative frequencies.

class WorkloadProfile
  attr_reader :request_type, :avg_cpu_ms, :avg_memory_mb, :avg_db_queries
  
  def initialize(request_type, cpu_ms, memory_mb, db_queries)
    @request_type = request_type
    @avg_cpu_ms = cpu_ms
    @avg_memory_mb = memory_mb
    @avg_db_queries = db_queries
  end
  
  def total_cpu_seconds(request_count)
    (request_count * @avg_cpu_ms) / 1000.0
  end
end

class CapacityCalculator
  def initialize(profiles, request_distribution)
    @profiles = profiles
    @distribution = request_distribution
  end
  
  def required_cpu_cores(total_requests, target_duration_hours)
    total_cpu_seconds = @profiles.sum do |profile|
      request_count = total_requests * @distribution[profile.request_type]
      profile.total_cpu_seconds(request_count)
    end
    
    available_seconds = target_duration_hours * 3600
    (total_cpu_seconds / available_seconds).ceil
  end
end

Measurement Precision affects planning accuracy. Resource metrics contain noise from system fluctuations, measurement overhead, and sampling intervals. Capacity planning aggregates measurements over meaningful time windows to reduce noise. Percentile-based metrics provide more reliable signals than averages, as they account for variance in resource consumption patterns.

Scaling Strategies determine how capacity increases with demand. Vertical scaling adds resources to existing instances. Horizontal scaling adds more instances. Each strategy has different capacity characteristics and planning implications. Vertical scaling has upper limits based on hardware constraints. Horizontal scaling requires workload distribution mechanisms and account for coordination overhead.

Design Considerations

Capacity planning strategies differ based on system architecture, traffic patterns, and business constraints. The choice between reactive and proactive planning affects operational burden, infrastructure costs, and reliability characteristics.

Reactive vs Proactive Planning represents the fundamental strategic choice. Reactive planning responds to observed capacity constraints after they occur or approach critical thresholds. Proactive planning anticipates requirements before constraints manifest. Reactive planning minimizes infrastructure costs but increases incident risk. Proactive planning maintains higher reliability at the cost of over-provisioning.

Reactive planning suits systems with unpredictable traffic patterns, high tolerance for temporary degradation, and rapid scaling capabilities. Proactive planning fits systems with predictable growth, strict uptime requirements, and slow scaling processes. Most organizations blend both approaches, maintaining proactive capacity for baseline load while reserving reactive measures for unexpected spikes.

Planning Horizon determines how far into the future capacity projections extend. Short horizons of weeks or months require less sophisticated forecasting but demand frequent re-evaluation. Long horizons of quarters or years enable better procurement planning but accumulate forecast errors. The optimal horizon balances forecast accuracy against planning stability.

class CapacityPlan
  attr_reader :horizon_months, :review_frequency
  
  def initialize(horizon_months:, review_frequency:)
    @horizon_months = horizon_months
    @review_frequency = review_frequency
    @capacity_points = []
  end
  
  def add_checkpoint(month, cpu_cores, memory_gb, storage_tb)
    @capacity_points << {
      month: month,
      cpu_cores: cpu_cores,
      memory_gb: memory_gb,
      storage_tb: storage_tb
    }
  end
  
  def interpolate_capacity(target_month)
    return nil if @capacity_points.empty?
    
    before = @capacity_points.reverse.find { |p| p[:month] <= target_month }
    after = @capacity_points.find { |p| p[:month] >= target_month }
    
    return before if before && !after
    return after if after && !before
    return before if before[:month] == target_month
    
    # Linear interpolation between points
    ratio = (target_month - before[:month]).to_f / (after[:month] - before[:month])
    
    {
      cpu_cores: interpolate(before[:cpu_cores], after[:cpu_cores], ratio),
      memory_gb: interpolate(before[:memory_gb], after[:memory_gb], ratio),
      storage_tb: interpolate(before[:storage_tb], after[:storage_tb], ratio)
    }
  end
  
  private
  
  def interpolate(start_val, end_val, ratio)
    start_val + ((end_val - start_val) * ratio)
  end
end

Cost Optimization balances performance requirements against infrastructure expenses. Over-provisioning wastes budget on unused capacity. Under-provisioning causes performance degradation or outages. The optimal point depends on the cost of resources versus the cost of performance problems.

Cloud infrastructure enables fine-grained capacity adjustments but introduces complexity in instance type selection, commitment strategies, and multi-region deployment. Reserved instances reduce costs for predictable baseline capacity. Spot instances provide cost-effective burst capacity for fault-tolerant workloads. The capacity plan should specify resource allocation across different procurement models.

Multi-Tier Planning accounts for dependencies between system layers. Application servers depend on database capacity, which depends on storage capacity. Cache layers affect database load. Message queues buffer load between producers and consumers. Capacity planning must model these interdependencies to avoid creating bottlenecks in downstream systems.

Implementation Approaches

Capacity planning implementations range from manual spreadsheet tracking to automated prediction systems. The sophistication level should match organizational scale, growth velocity, and operational maturity.

Spreadsheet-Based Planning tracks resource usage in periodic snapshots recorded to spreadsheets. Analysts manually calculate trends, project future requirements, and document capacity decisions. This approach works for small deployments with stable growth patterns. The method requires minimal tooling but scales poorly with system complexity.

# Generate capacity report for spreadsheet import
class CapacityReporter
  def initialize(metrics_source)
    @metrics = metrics_source
  end
  
  def generate_csv_report(start_date, end_date)
    CSV.generate do |csv|
      csv << ["Date", "Avg CPU %", "Peak CPU %", "Avg Memory GB", 
              "Peak Memory GB", "Request Count", "P95 Response Time"]
      
      (start_date..end_date).each do |date|
        daily_metrics = @metrics.for_date(date)
        
        csv << [
          date.to_s,
          daily_metrics.avg_cpu.round(2),
          daily_metrics.peak_cpu.round(2),
          daily_metrics.avg_memory.round(2),
          daily_metrics.peak_memory.round(2),
          daily_metrics.total_requests,
          daily_metrics.p95_response_time.round(0)
        ]
      end
    end
  end
end

Threshold-Based Alerting triggers notifications when resource usage exceeds defined thresholds. Alerts prompt manual capacity evaluation and scaling decisions. This reactive approach responds to immediate capacity constraints but provides no advance warning. Threshold configuration requires understanding normal usage patterns to avoid false positives.

Multiple threshold levels create escalation paths. A warning threshold at 60% utilization triggers investigation. A critical threshold at 80% triggers immediate action. This layered approach balances responsiveness with operational burden.

Time-Series Forecasting applies statistical methods to historical metrics for future projection. Linear regression, moving averages, exponential smoothing, and seasonal decomposition models extract trends from historical data. More sophisticated approaches use machine learning models trained on historical capacity data.

require 'matrix'

class TimeSeriesForecaster
  def initialize(historical_points)
    @points = historical_points.sort_by { |p| p[:timestamp] }
  end
  
  def simple_moving_average(window_size, periods_ahead)
    recent_values = @points.last(window_size).map { |p| p[:value] }
    average = recent_values.sum / recent_values.size.to_f
    
    Array.new(periods_ahead, average)
  end
  
  def linear_regression_forecast(periods_ahead)
    x_values = @points.map.with_index { |_, i| i.to_f }
    y_values = @points.map { |p| p[:value].to_f }
    
    n = x_values.size
    sum_x = x_values.sum
    sum_y = y_values.sum
    sum_xy = x_values.zip(y_values).map { |x, y| x * y }.sum
    sum_x_squared = x_values.map { |x| x * x }.sum
    
    slope = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x * sum_x)
    intercept = (sum_y - slope * sum_x) / n
    
    future_indices = ((n)..(n + periods_ahead - 1)).to_a
    future_indices.map { |x| slope * x + intercept }
  end
  
  def exponential_smoothing_forecast(alpha, periods_ahead)
    return [] if @points.empty?
    
    smoothed = [@points.first[:value]]
    
    @points[1..-1].each do |point|
      smoothed << alpha * point[:value] + (1 - alpha) * smoothed.last
    end
    
    forecast_value = smoothed.last
    Array.new(periods_ahead, forecast_value)
  end
end

Load Testing Capacity Discovery measures system capacity through controlled load generation. Load tests identify breaking points, characterize resource consumption under various load levels, and validate capacity models. Regular load testing updates capacity parameters as code changes affect resource consumption.

Load tests should cover multiple workload scenarios representing production traffic patterns. Testing only the best-case scenario produces optimistic capacity estimates. Include cache-cold scenarios, database query patterns, and concurrent operation mixes that reflect production behavior.

Auto-Scaling Integration couples capacity monitoring with automatic resource provisioning. Systems automatically add or remove instances based on observed metrics. This approach maintains target utilization levels without manual intervention. Auto-scaling requires well-tuned triggers, graceful instance addition/removal, and sufficient lead time for provisioning.

class AutoScalingPolicy
  def initialize(
    min_instances:,
    max_instances:,
    scale_up_threshold:,
    scale_down_threshold:,
    cooldown_period:
  )
    @min_instances = min_instances
    @max_instances = max_instances
    @scale_up_threshold = scale_up_threshold
    @scale_down_threshold = scale_down_threshold
    @cooldown_period = cooldown_period
    @last_scaling_action = nil
  end
  
  def evaluate(current_instances, current_cpu_percent)
    return nil if in_cooldown?
    
    if current_cpu_percent >= @scale_up_threshold
      desired = [current_instances + scale_increment(current_instances), 
                 @max_instances].min
      return desired if desired > current_instances
    elsif current_cpu_percent <= @scale_down_threshold
      desired = [current_instances - 1, @min_instances].max
      return desired if desired < current_instances
    end
    
    nil
  end
  
  def record_scaling_action
    @last_scaling_action = Time.now
  end
  
  private
  
  def in_cooldown?
    return false unless @last_scaling_action
    Time.now - @last_scaling_action < @cooldown_period
  end
  
  def scale_increment(current_count)
    # Scale faster at lower counts, more gradually at higher counts
    current_count < 10 ? 2 : (current_count * 0.2).ceil
  end
end

Capacity Reservation Systems allocate resources to different workload classes to prevent resource contention. Database connection pools reserve connections for critical transactions. CPU quotas allocate processing capacity across tenants. This approach prevents any single workload from exhausting shared resources.

Ruby Implementation

Ruby applications require capacity planning for application servers, background job processors, and ancillary services. Ruby's Global Interpreter Lock affects CPU capacity calculations, as a single Ruby process cannot fully utilize multiple CPU cores without forking or threading.

Process-Based Concurrency remains the primary scaling model for Ruby web applications. Application servers like Puma, Unicorn, and Passenger spawn multiple worker processes. Each worker handles requests independently. Capacity planning must account for per-process memory overhead and CPU allocation across workers.

# Calculate worker capacity for a Puma server
class PumaCapacityPlanner
  def initialize(available_memory_mb:, available_cpu_cores:)
    @available_memory_mb = available_memory_mb
    @available_cpu_cores = available_cpu_cores
  end
  
  def recommended_workers(per_process_memory_mb:, threads_per_worker:)
    # Reserve memory for OS and overhead
    usable_memory = @available_memory_mb * 0.75
    
    # Memory-constrained worker count
    memory_limited = (usable_memory / per_process_memory_mb).floor
    
    # CPU-constrained worker count (account for blocking I/O)
    cpu_limited = (@available_cpu_cores * 1.5).ceil
    
    # Take the more conservative limit
    worker_count = [memory_limited, cpu_limited].min
    
    {
      workers: worker_count,
      threads: threads_per_worker,
      total_concurrency: worker_count * threads_per_worker,
      memory_usage_mb: worker_count * per_process_memory_mb,
      reasoning: memory_limited < cpu_limited ? :memory_bound : :cpu_bound
    }
  end
end

planner = PumaCapacityPlanner.new(
  available_memory_mb: 16_000,
  available_cpu_cores: 8
)

config = planner.recommended_workers(
  per_process_memory_mb: 512,
  threads_per_worker: 5
)
# => {
#   workers: 23,
#   threads: 5,
#   total_concurrency: 115,
#   memory_usage_mb: 11776,
#   reasoning: :memory_bound
# }

Memory Profiling identifies per-request memory allocation and retention. Ruby's garbage collector reclaims memory, but retained objects accumulate across requests. Memory profiling tools like memory_profiler and derailed_benchmarks measure allocation patterns.

require 'memory_profiler'

class EndpointCapacityAnalyzer
  def analyze_endpoint(controller, action)
    report = MemoryProfiler.report do
      10.times { simulate_request(controller, action) }
    end
    
    {
      total_allocated_mb: report.total_allocated_memsize / 1024.0 / 1024.0,
      total_retained_mb: report.total_retained_memsize / 1024.0 / 1024.0,
      allocated_objects: report.total_allocated,
      retained_objects: report.total_retained,
      avg_per_request_mb: (report.total_allocated_memsize / 10.0) / 1024.0 / 1024.0
    }
  end
  
  def project_memory_requirements(metrics, requests_per_second, workers)
    per_request_mb = metrics[:avg_per_request_mb]
    baseline_memory_mb = 256  # Base process memory
    
    # Assume each worker handles requests_per_second / workers requests
    requests_per_worker = requests_per_second.to_f / workers
    
    # Memory accumulates during request lifetime (assume 100ms avg)
    concurrent_requests_per_worker = requests_per_worker * 0.1
    concurrent_memory = concurrent_requests_per_worker * per_request_mb
    
    total_per_worker = baseline_memory_mb + concurrent_memory
    total_system_memory = total_per_worker * workers
    
    {
      per_worker_mb: total_per_worker.ceil,
      total_system_mb: total_system_memory.ceil,
      concurrent_requests_per_worker: concurrent_requests_per_worker.ceil
    }
  end
end

Database Connection Pooling limits concurrent database connections per process. Connection pool size affects request concurrency and database capacity. Under-sized pools cause request queueing. Over-sized pools exhaust database connection limits.

# Calculate database connection requirements
class DatabaseCapacityPlanner
  def initialize(db_max_connections:, app_server_count:)
    @db_max_connections = db_max_connections
    @app_server_count = app_server_count
  end
  
  def calculate_pool_size(workers_per_server:, threads_per_worker:)
    # Total potential connections across all servers
    total_threads = @app_server_count * workers_per_server * threads_per_worker
    
    # Reserve connections for background jobs, admin, monitoring
    reserved_connections = 20
    available_for_app = @db_max_connections - reserved_connections
    
    # Calculate pool size per worker
    if total_threads <= available_for_app
      # Plenty of connections available
      recommended_pool_size = threads_per_worker
    else
      # Need to limit pool size
      connections_per_server = available_for_app / @app_server_count
      connections_per_worker = connections_per_server / workers_per_server
      recommended_pool_size = [connections_per_worker, threads_per_worker].min
    end
    
    {
      recommended_pool_size: recommended_pool_size,
      total_app_connections: recommended_pool_size * workers_per_server * @app_server_count,
      utilization_percent: ((recommended_pool_size * workers_per_server * @app_server_count * 100.0) / available_for_app).round(1),
      constraint: recommended_pool_size < threads_per_worker ? :connection_limited : :thread_matched
    }
  end
end

Background Job Capacity requires separate capacity planning from web request handling. Job processors like Sidekiq use threading for concurrency. Job types have varying resource profiles. Long-running jobs consume workers for extended periods, reducing available capacity for other jobs.

class SidekiqCapacityPlanner
  def initialize(concurrency:, avg_memory_per_thread_mb:)
    @concurrency = concurrency
    @avg_memory_per_thread = avg_memory_per_thread_mb
  end
  
  def max_throughput(job_mix)
    # Calculate weighted average job duration
    total_weight = job_mix.sum { |job| job[:frequency] }
    avg_duration = job_mix.sum { |job| 
      (job[:avg_duration_seconds] * job[:frequency]) / total_weight 
    }
    
    # Throughput = concurrency / avg_duration
    jobs_per_second = @concurrency.to_f / avg_duration
    
    {
      jobs_per_second: jobs_per_second.round(2),
      jobs_per_minute: (jobs_per_second * 60).round(0),
      jobs_per_hour: (jobs_per_second * 3600).round(0),
      avg_duration_seconds: avg_duration.round(2)
    }
  end
  
  def required_concurrency(target_jobs_per_hour, job_mix)
    jobs_per_second = target_jobs_per_hour / 3600.0
    
    total_weight = job_mix.sum { |job| job[:frequency] }
    avg_duration = job_mix.sum { |job|
      (job[:avg_duration_seconds] * job[:frequency]) / total_weight
    }
    
    required = (jobs_per_second * avg_duration).ceil
    
    {
      required_concurrency: required,
      memory_estimate_mb: (required * @avg_memory_per_thread).ceil
    }
  end
end

Tools & Ecosystem

Capacity planning relies on monitoring infrastructure, metric storage, and analysis tools. Ruby applications integrate with various monitoring platforms for metric collection and visualization.

Application Performance Monitoring platforms like New Relic, Datadog, and Scout provide pre-built capacity analytics. These services collect application metrics, infrastructure metrics, and distributed traces. Built-in dashboards track resource utilization trends and forecast capacity needs.

Ruby instrumentation occurs through agent gems that hook into web frameworks and libraries. Agents report metrics to external platforms for aggregation and analysis. APM platforms provide alerting, anomaly detection, and capacity trend visualization.

Metrics Libraries export custom application metrics to time-series databases. The prometheus-client gem exposes metrics in Prometheus format. The statsd-instrument gem sends metrics to StatsD-compatible collectors. Custom metrics supplement infrastructure metrics with application-specific capacity indicators.

require 'prometheus/client'

class CapacityMetricsExporter
  def initialize
    @registry = Prometheus::Client.registry
    
    @cpu_usage = @registry.gauge(
      :app_cpu_utilization_percent,
      docstring: 'Current CPU utilization percentage'
    )
    
    @memory_usage = @registry.gauge(
      :app_memory_usage_bytes,
      docstring: 'Current memory usage in bytes'
    )
    
    @active_connections = @registry.gauge(
      :app_active_database_connections,
      docstring: 'Number of active database connections'
    )
    
    @request_duration = @registry.histogram(
      :app_request_duration_seconds,
      docstring: 'Request processing duration',
      buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
    )
  end
  
  def record_request(duration_seconds)
    @request_duration.observe(duration_seconds)
  end
  
  def update_resource_usage
    process_info = process_metrics
    
    @cpu_usage.set(process_info[:cpu_percent])
    @memory_usage.set(process_info[:memory_bytes])
    @active_connections.set(ActiveRecord::Base.connection_pool.connections.count)
  end
  
  private
  
  def process_metrics
    # Platform-specific process metrics
    # Simplified for demonstration
    {
      cpu_percent: 45.2,
      memory_bytes: 512 * 1024 * 1024
    }
  end
end

Load Testing Tools validate capacity plans through controlled load generation. Apache JMeter, Gatling, and k6 generate traffic patterns for capacity verification. Ruby-specific tools like wrk and siege test HTTP endpoints.

Load testing confirms capacity models by measuring actual resource consumption under various load levels. Tests should ramp load gradually to identify breaking points and characterize resource consumption curves. Testing at multiple load levels validates model accuracy across the capacity range.

Time-Series Databases store capacity metrics for historical analysis and forecasting. Prometheus, InfluxDB, and TimescaleDB optimize for high-cardinality time-series data. These databases support retention policies, downsampling, and efficient range queries necessary for capacity analysis.

require 'influxdb'

class CapacityMetricsStore
  def initialize(host:, database:)
    @client = InfluxDB::Client.new(
      host: host,
      database: database,
      time_precision: 's'
    )
  end
  
  def write_capacity_snapshot(tags:, values:)
    @client.write_point(
      'capacity_metrics',
      tags: tags,
      values: values,
      timestamp: Time.now.to_i
    )
  end
  
  def query_historical_usage(start_time:, end_time:, metric:)
    query = <<~INFLUXQL
      SELECT mean(#{metric}) as avg_value, max(#{metric}) as max_value
      FROM capacity_metrics
      WHERE time >= '#{start_time.iso8601}' AND time <= '#{end_time.iso8601}'
      GROUP BY time(1h)
    INFLUXQL
    
    result = @client.query(query)
    result[0]['values'].map { |point| point.symbolize_keys }
  end
  
  def calculate_growth_rate(metric:, days:)
    end_time = Time.now
    start_time = end_time - (days * 24 * 3600)
    
    query = <<~INFLUXQL
      SELECT mean(#{metric}) as value
      FROM capacity_metrics
      WHERE time >= '#{start_time.iso8601}' AND time <= '#{end_time.iso8601}'
      GROUP BY time(1d)
    INFLUXQL
    
    result = @client.query(query)
    values = result[0]['values'].map { |p| p['value'] }
    
    return 0 if values.size < 2
    
    start_value = values.first
    end_value = values.last
    
    ((end_value - start_value) / start_value * 100).round(2)
  end
end

Profiling Tools identify code-level resource consumption patterns. The ruby-prof gem profiles CPU time and memory allocation. Rack-mini-profiler adds profiling to web requests. Stackprof provides statistical CPU profiling with minimal overhead.

Profiling results inform capacity planning by revealing resource-intensive code paths. Optimization efforts target high-resource operations identified through profiling. Regular profiling detects performance regressions that affect capacity requirements.

Real-World Applications

Production capacity planning scenarios demonstrate the practical application of capacity planning principles across different system architectures and business contexts.

E-Commerce Platform Scaling requires capacity planning for traffic spikes during sales events. A typical e-commerce site handles 1000 requests per second during normal operation but faces 10x traffic during flash sales. Capacity planning accounts for this variance while controlling infrastructure costs.

class EcommerceCapacityPlanner
  def initialize(baseline_rps:, baseline_instances:)
    @baseline_rps = baseline_rps
    @baseline_instances = baseline_instances
  end
  
  def plan_for_sale_event(expected_traffic_multiplier:, duration_hours:)
    expected_rps = @baseline_rps * expected_traffic_multiplier
    
    # Account for uneven traffic distribution
    peak_rps = expected_rps * 1.3
    
    # Calculate required instances (with 30% headroom)
    required_capacity = peak_rps / (@baseline_rps / @baseline_instances)
    required_instances = (required_capacity * 1.3).ceil
    
    # Database connection planning
    threads_per_instance = 5
    db_connections = required_instances * threads_per_instance
    
    # Cache requirements (assuming 80% cache hit rate during sale)
    cache_hit_rate = 0.8
    database_rps = peak_rps * (1 - cache_hit_rate)
    
    {
      event_duration_hours: duration_hours,
      baseline_rps: @baseline_rps,
      expected_peak_rps: peak_rps.round(0),
      recommended_instances: required_instances,
      database_connections_needed: db_connections,
      estimated_database_rps: database_rps.round(0),
      cost_multiplier: (required_instances.to_f / @baseline_instances).round(1)
    }
  end
  
  def pre_warm_caches(product_ids)
    # Pre-load hot products into cache before event
    product_ids.each_slice(100) do |batch|
      Rails.cache.fetch_multi(*batch.map { |id| "product_#{id}" }) do |id|
        Product.find(id.split('_').last)
      end
    end
  end
end

Multi-Tenant SaaS Capacity Allocation distributes resources across customer organizations. Large customers consume more resources than small customers. Capacity planning balances resource allocation to prevent any single tenant from affecting others while maintaining cost efficiency.

Resource quotas enforce capacity limits per tenant. CPU shares, memory limits, and rate limiting prevent resource monopolization. Capacity planning determines quota levels based on pricing tiers and resource costs.

Background Processing Pipeline Capacity scales job processing capacity to meet SLA requirements. A video encoding service might need to process 10,000 videos daily with 4-hour maximum processing time. Capacity planning determines required worker count and instance specifications.

class VideoProcessingCapacityPlanner
  def initialize(encoding_profiles)
    @profiles = encoding_profiles
  end
  
  def calculate_required_capacity(daily_video_count:, max_sla_hours:)
    # Calculate weighted average encoding time
    total_weight = @profiles.sum { |p| p[:percentage] }
    avg_encoding_minutes = @profiles.sum do |profile|
      (profile[:avg_minutes] * profile[:percentage]) / total_weight
    end
    
    # Total encoding time needed per day
    total_minutes_needed = daily_video_count * avg_encoding_minutes
    
    # Available processing time per worker (accounting for 80% utilization)
    available_minutes_per_worker = max_sla_hours * 60 * 0.8
    
    # Required workers
    required_workers = (total_minutes_needed / available_minutes_per_worker).ceil
    
    # Memory requirements (video encoding is memory-intensive)
    memory_per_worker_gb = 8
    total_memory_gb = required_workers * memory_per_worker_gb
    
    # Storage requirements for queue
    avg_video_size_gb = 2
    peak_queue_size = daily_video_count * 0.3  # Assume 30% arrive during peak hours
    required_storage_gb = (peak_queue_size * avg_video_size_gb * 1.5).ceil
    
    {
      daily_video_count: daily_video_count,
      avg_encoding_minutes: avg_encoding_minutes.round(1),
      required_workers: required_workers,
      total_memory_gb: total_memory_gb,
      required_storage_gb: required_storage_gb,
      max_throughput_per_day: (required_workers * available_minutes_per_worker / avg_encoding_minutes).round(0)
    }
  end
end

Database Capacity Planning prevents database bottlenecks as application traffic grows. Database capacity constraints include connection limits, CPU capacity, memory for buffer pools, and storage I/O throughput. Capacity planning models database growth and determines appropriate instance sizing.

Read replicas distribute read load across multiple database instances. Write capacity remains constrained by the primary database. Capacity planning accounts for read/write ratios and determines replica count requirements.

Geographic Distribution Planning allocates capacity across multiple regions for latency and availability. Multi-region deployment requires capacity planning for each region while accounting for traffic distribution, failover scenarios, and data replication overhead.

Reference

Core Capacity Metrics

Metric	Description	Typical Threshold
CPU Utilization	Percentage of CPU capacity consumed	70-80% sustained
Memory Usage	RAM consumption as percentage of total	85% sustained
Disk I/O	Read/write operations per second	80% of rated IOPS
Network Bandwidth	Data transfer rate as percentage of link capacity	70% sustained
Connection Pool	Active database connections vs pool size	80% of pool size
Queue Depth	Pending items in processing queues	Varies by SLA
Error Rate	Failed requests as percentage of total	>1% of requests
Response Time	Request processing latency	P95 >500ms

Capacity Planning Formulas

Calculation	Formula	Use Case
Required Instances	(Target RPS × Avg Response Time) ÷ (Instance RPS × Target Utilization)	Web server capacity
Database Connections	(App Instances × Workers per Instance × Threads per Worker) + Reserved	Connection pool sizing
Memory Requirement	(Worker Count × Base Memory) + (Concurrent Requests × Request Memory)	Process memory planning
Storage Growth	Current Size × (1 + Daily Growth Rate) ^ Days	Storage provisioning
Queue Processing Time	(Queue Depth × Avg Processing Time) ÷ Worker Count	SLA validation

Ruby Process Sizing Guidelines

Server Type	Memory per Worker	CPU per Worker	Threads per Worker
Puma (small app)	256-512 MB	0.5-1 core	3-5
Puma (large app)	512-1024 MB	1-2 cores	5-10
Unicorn	512-1024 MB	1 core	1
Sidekiq	256-512 MB	0.5-1 core	10-25
Passenger	256-512 MB	0.5-1 core	1

Headroom Recommendations

Traffic Pattern	Recommended Headroom	Rationale
Stable, predictable	20-30%	Handles minor spikes
Variable, seasonal	30-50%	Accommodates peaks
Unpredictable, viral	50-100%	Rapid growth protection
Auto-scaling enabled	10-20%	System scales automatically
Manual scaling only	40-60%	Buffer for provisioning delay

Monitoring Collection Intervals

Metric Type	Collection Interval	Retention Period
Infrastructure (CPU, memory)	10-60 seconds	1-3 months full, 1 year aggregated
Application (request rate)	10-60 seconds	1-3 months full, 1 year aggregated
Database (queries, connections)	10-60 seconds	1-3 months full, 1 year aggregated
Business (signups, purchases)	1-5 minutes	2 years full
Capacity snapshots	1-24 hours	3-5 years

Scaling Decision Matrix

Current Utilization	Growth Rate	Action Required	Timeline
<50%	<5% monthly	Monitor	Quarterly review
50-70%	<5% monthly	Plan	2-3 month horizon
50-70%	5-10% monthly	Plan	1-2 month horizon
70-85%	Any	Scale soon	2-4 weeks
>85%	Any	Scale immediately	1 week
>95%	Any	Emergency scaling	24-48 hours

Capacity Planning