Overview
Capacity planning determines the infrastructure resources required to meet application performance goals under anticipated load conditions. The process involves measuring current resource usage, forecasting future demand, identifying bottlenecks, and allocating resources to maintain acceptable response times and throughput.
Software systems consume computational resources including CPU cycles, memory, disk I/O, network bandwidth, and database connections. Capacity planning quantifies these resource requirements across different load scenarios to prevent performance degradation, service outages, and resource exhaustion. The practice applies to single servers, distributed systems, databases, message queues, caches, and all infrastructure components.
The planning process follows a cycle: baseline current performance, model resource consumption patterns, project future growth, provision infrastructure, monitor actual usage, and refine predictions. This cycle repeats as application behavior changes through feature additions, traffic growth, or architectural modifications.
Capacity planning differs from performance optimization. Performance optimization reduces resource consumption for a given workload, while capacity planning ensures sufficient resources exist for expected workloads. Both practices complement each other in maintaining system reliability.
# Capacity planning tracks resource usage over time
class CapacityMetrics
def initialize
@snapshots = []
end
def record_snapshot
@snapshots << {
timestamp: Time.now,
cpu_percent: cpu_usage,
memory_mb: memory_usage,
active_connections: connection_count
}
end
def forecast_exhaustion(resource, threshold)
return nil if @snapshots.size < 2
trend = calculate_trend(@snapshots, resource)
current = @snapshots.last[resource]
time_to_threshold = (threshold - current) / trend
Time.now + time_to_threshold
end
end
Organizations implement capacity planning at multiple scales. Small applications might track CPU and memory on a single server. Large distributed systems require planning across hundreds of services, each with distinct resource profiles. Cloud environments add complexity through auto-scaling, spot instances, and variable performance characteristics.
Key Principles
Resource Consumption Modeling establishes the relationship between workload and resource usage. Models range from simple linear relationships to complex multi-variable regressions. A linear model might express CPU usage as a function of requests per second, while sophisticated models account for request type, payload size, cache hit rates, and concurrent operations.
# Linear capacity model
class LinearCapacityModel
def initialize(baseline_load, baseline_cpu)
@baseline_load = baseline_load.to_f
@baseline_cpu = baseline_cpu.to_f
end
def predict_cpu(load)
(@baseline_cpu / @baseline_load) * load
end
def max_load_for_cpu(cpu_limit)
(@baseline_load * cpu_limit) / @baseline_cpu
end
end
model = LinearCapacityModel.new(1000, 45.0)
model.predict_cpu(2500) # => 112.5% CPU at 2500 req/s
model.max_load_for_cpu(80.0) # => 1777.78 req/s at 80% CPU
Headroom Management maintains buffer capacity above anticipated peak load. Headroom accounts for traffic spikes, measurement errors, and unexpected load patterns. Common practice reserves 20-50% headroom depending on traffic volatility and scaling agility. Systems with rapid auto-scaling capability require less headroom than manually-scaled infrastructure.
Growth Projection extrapolates future resource requirements from historical trends. Simple projections use linear extrapolation. Sophisticated approaches model seasonal patterns, cyclical behavior, and exponential growth curves. Projection accuracy improves with longer historical datasets and accounts for known future events like marketing campaigns or feature launches.
class GrowthProjector
def initialize(historical_data)
@data = historical_data.sort_by { |point| point[:timestamp] }
end
def linear_projection(months_ahead)
return nil if @data.size < 2
x_values = @data.map.with_index { |_, i| i }
y_values = @data.map { |point| point[:value] }
slope = calculate_slope(x_values, y_values)
intercept = calculate_intercept(x_values, y_values, slope)
future_index = @data.size + (months_ahead * 30)
slope * future_index + intercept
end
def exponential_projection(months_ahead)
return nil if @data.size < 3
log_values = @data.map { |point| Math.log(point[:value]) }
growth_rate = calculate_compound_growth_rate(log_values)
current_value = @data.last[:value]
current_value * ((1 + growth_rate) ** months_ahead)
end
end
Bottleneck Identification locates the resource that limits system capacity. The bottleneck determines maximum achievable throughput regardless of other resource availability. Common bottlenecks include database connection pools, CPU saturation, memory exhaustion, disk I/O limits, and network bandwidth constraints. Capacity planning prioritizes bottleneck resources.
Workload Characterization categorizes requests by resource consumption profile. Not all requests consume identical resources. Read operations differ from writes, cached requests differ from cache misses, and background jobs differ from API requests. Accurate capacity planning requires understanding the mix of workload types and their relative frequencies.
class WorkloadProfile
attr_reader :request_type, :avg_cpu_ms, :avg_memory_mb, :avg_db_queries
def initialize(request_type, cpu_ms, memory_mb, db_queries)
@request_type = request_type
@avg_cpu_ms = cpu_ms
@avg_memory_mb = memory_mb
@avg_db_queries = db_queries
end
def total_cpu_seconds(request_count)
(request_count * @avg_cpu_ms) / 1000.0
end
end
class CapacityCalculator
def initialize(profiles, request_distribution)
@profiles = profiles
@distribution = request_distribution
end
def required_cpu_cores(total_requests, target_duration_hours)
total_cpu_seconds = @profiles.sum do |profile|
request_count = total_requests * @distribution[profile.request_type]
profile.total_cpu_seconds(request_count)
end
available_seconds = target_duration_hours * 3600
(total_cpu_seconds / available_seconds).ceil
end
end
Measurement Precision affects planning accuracy. Resource metrics contain noise from system fluctuations, measurement overhead, and sampling intervals. Capacity planning aggregates measurements over meaningful time windows to reduce noise. Percentile-based metrics provide more reliable signals than averages, as they account for variance in resource consumption patterns.
Scaling Strategies determine how capacity increases with demand. Vertical scaling adds resources to existing instances. Horizontal scaling adds more instances. Each strategy has different capacity characteristics and planning implications. Vertical scaling has upper limits based on hardware constraints. Horizontal scaling requires workload distribution mechanisms and account for coordination overhead.
Design Considerations
Capacity planning strategies differ based on system architecture, traffic patterns, and business constraints. The choice between reactive and proactive planning affects operational burden, infrastructure costs, and reliability characteristics.
Reactive vs Proactive Planning represents the fundamental strategic choice. Reactive planning responds to observed capacity constraints after they occur or approach critical thresholds. Proactive planning anticipates requirements before constraints manifest. Reactive planning minimizes infrastructure costs but increases incident risk. Proactive planning maintains higher reliability at the cost of over-provisioning.
Reactive planning suits systems with unpredictable traffic patterns, high tolerance for temporary degradation, and rapid scaling capabilities. Proactive planning fits systems with predictable growth, strict uptime requirements, and slow scaling processes. Most organizations blend both approaches, maintaining proactive capacity for baseline load while reserving reactive measures for unexpected spikes.
Planning Horizon determines how far into the future capacity projections extend. Short horizons of weeks or months require less sophisticated forecasting but demand frequent re-evaluation. Long horizons of quarters or years enable better procurement planning but accumulate forecast errors. The optimal horizon balances forecast accuracy against planning stability.
class CapacityPlan
attr_reader :horizon_months, :review_frequency
def initialize(horizon_months:, review_frequency:)
@horizon_months = horizon_months
@review_frequency = review_frequency
@capacity_points = []
end
def add_checkpoint(month, cpu_cores, memory_gb, storage_tb)
@capacity_points << {
month: month,
cpu_cores: cpu_cores,
memory_gb: memory_gb,
storage_tb: storage_tb
}
end
def interpolate_capacity(target_month)
return nil if @capacity_points.empty?
before = @capacity_points.reverse.find { |p| p[:month] <= target_month }
after = @capacity_points.find { |p| p[:month] >= target_month }
return before if before && !after
return after if after && !before
return before if before[:month] == target_month
# Linear interpolation between points
ratio = (target_month - before[:month]).to_f / (after[:month] - before[:month])
{
cpu_cores: interpolate(before[:cpu_cores], after[:cpu_cores], ratio),
memory_gb: interpolate(before[:memory_gb], after[:memory_gb], ratio),
storage_tb: interpolate(before[:storage_tb], after[:storage_tb], ratio)
}
end
private
def interpolate(start_val, end_val, ratio)
start_val + ((end_val - start_val) * ratio)
end
end
Cost Optimization balances performance requirements against infrastructure expenses. Over-provisioning wastes budget on unused capacity. Under-provisioning causes performance degradation or outages. The optimal point depends on the cost of resources versus the cost of performance problems.
Cloud infrastructure enables fine-grained capacity adjustments but introduces complexity in instance type selection, commitment strategies, and multi-region deployment. Reserved instances reduce costs for predictable baseline capacity. Spot instances provide cost-effective burst capacity for fault-tolerant workloads. The capacity plan should specify resource allocation across different procurement models.
Multi-Tier Planning accounts for dependencies between system layers. Application servers depend on database capacity, which depends on storage capacity. Cache layers affect database load. Message queues buffer load between producers and consumers. Capacity planning must model these interdependencies to avoid creating bottlenecks in downstream systems.
Implementation Approaches
Capacity planning implementations range from manual spreadsheet tracking to automated prediction systems. The sophistication level should match organizational scale, growth velocity, and operational maturity.
Spreadsheet-Based Planning tracks resource usage in periodic snapshots recorded to spreadsheets. Analysts manually calculate trends, project future requirements, and document capacity decisions. This approach works for small deployments with stable growth patterns. The method requires minimal tooling but scales poorly with system complexity.
# Generate capacity report for spreadsheet import
class CapacityReporter
def initialize(metrics_source)
@metrics = metrics_source
end
def generate_csv_report(start_date, end_date)
CSV.generate do |csv|
csv << ["Date", "Avg CPU %", "Peak CPU %", "Avg Memory GB",
"Peak Memory GB", "Request Count", "P95 Response Time"]
(start_date..end_date).each do |date|
daily_metrics = @metrics.for_date(date)
csv << [
date.to_s,
daily_metrics.avg_cpu.round(2),
daily_metrics.peak_cpu.round(2),
daily_metrics.avg_memory.round(2),
daily_metrics.peak_memory.round(2),
daily_metrics.total_requests,
daily_metrics.p95_response_time.round(0)
]
end
end
end
end
Threshold-Based Alerting triggers notifications when resource usage exceeds defined thresholds. Alerts prompt manual capacity evaluation and scaling decisions. This reactive approach responds to immediate capacity constraints but provides no advance warning. Threshold configuration requires understanding normal usage patterns to avoid false positives.
Multiple threshold levels create escalation paths. A warning threshold at 60% utilization triggers investigation. A critical threshold at 80% triggers immediate action. This layered approach balances responsiveness with operational burden.
Time-Series Forecasting applies statistical methods to historical metrics for future projection. Linear regression, moving averages, exponential smoothing, and seasonal decomposition models extract trends from historical data. More sophisticated approaches use machine learning models trained on historical capacity data.
require 'matrix'
class TimeSeriesForecaster
def initialize(historical_points)
@points = historical_points.sort_by { |p| p[:timestamp] }
end
def simple_moving_average(window_size, periods_ahead)
recent_values = @points.last(window_size).map { |p| p[:value] }
average = recent_values.sum / recent_values.size.to_f
Array.new(periods_ahead, average)
end
def linear_regression_forecast(periods_ahead)
x_values = @points.map.with_index { |_, i| i.to_f }
y_values = @points.map { |p| p[:value].to_f }
n = x_values.size
sum_x = x_values.sum
sum_y = y_values.sum
sum_xy = x_values.zip(y_values).map { |x, y| x * y }.sum
sum_x_squared = x_values.map { |x| x * x }.sum
slope = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x * sum_x)
intercept = (sum_y - slope * sum_x) / n
future_indices = ((n)..(n + periods_ahead - 1)).to_a
future_indices.map { |x| slope * x + intercept }
end
def exponential_smoothing_forecast(alpha, periods_ahead)
return [] if @points.empty?
smoothed = [@points.first[:value]]
@points[1..-1].each do |point|
smoothed << alpha * point[:value] + (1 - alpha) * smoothed.last
end
forecast_value = smoothed.last
Array.new(periods_ahead, forecast_value)
end
end
Load Testing Capacity Discovery measures system capacity through controlled load generation. Load tests identify breaking points, characterize resource consumption under various load levels, and validate capacity models. Regular load testing updates capacity parameters as code changes affect resource consumption.
Load tests should cover multiple workload scenarios representing production traffic patterns. Testing only the best-case scenario produces optimistic capacity estimates. Include cache-cold scenarios, database query patterns, and concurrent operation mixes that reflect production behavior.
Auto-Scaling Integration couples capacity monitoring with automatic resource provisioning. Systems automatically add or remove instances based on observed metrics. This approach maintains target utilization levels without manual intervention. Auto-scaling requires well-tuned triggers, graceful instance addition/removal, and sufficient lead time for provisioning.
class AutoScalingPolicy
def initialize(
min_instances:,
max_instances:,
scale_up_threshold:,
scale_down_threshold:,
cooldown_period:
)
@min_instances = min_instances
@max_instances = max_instances
@scale_up_threshold = scale_up_threshold
@scale_down_threshold = scale_down_threshold
@cooldown_period = cooldown_period
@last_scaling_action = nil
end
def evaluate(current_instances, current_cpu_percent)
return nil if in_cooldown?
if current_cpu_percent >= @scale_up_threshold
desired = [current_instances + scale_increment(current_instances),
@max_instances].min
return desired if desired > current_instances
elsif current_cpu_percent <= @scale_down_threshold
desired = [current_instances - 1, @min_instances].max
return desired if desired < current_instances
end
nil
end
def record_scaling_action
@last_scaling_action = Time.now
end
private
def in_cooldown?
return false unless @last_scaling_action
Time.now - @last_scaling_action < @cooldown_period
end
def scale_increment(current_count)
# Scale faster at lower counts, more gradually at higher counts
current_count < 10 ? 2 : (current_count * 0.2).ceil
end
end
Capacity Reservation Systems allocate resources to different workload classes to prevent resource contention. Database connection pools reserve connections for critical transactions. CPU quotas allocate processing capacity across tenants. This approach prevents any single workload from exhausting shared resources.
Ruby Implementation
Ruby applications require capacity planning for application servers, background job processors, and ancillary services. Ruby's Global Interpreter Lock affects CPU capacity calculations, as a single Ruby process cannot fully utilize multiple CPU cores without forking or threading.
Process-Based Concurrency remains the primary scaling model for Ruby web applications. Application servers like Puma, Unicorn, and Passenger spawn multiple worker processes. Each worker handles requests independently. Capacity planning must account for per-process memory overhead and CPU allocation across workers.
# Calculate worker capacity for a Puma server
class PumaCapacityPlanner
def initialize(available_memory_mb:, available_cpu_cores:)
@available_memory_mb = available_memory_mb
@available_cpu_cores = available_cpu_cores
end
def recommended_workers(per_process_memory_mb:, threads_per_worker:)
# Reserve memory for OS and overhead
usable_memory = @available_memory_mb * 0.75
# Memory-constrained worker count
memory_limited = (usable_memory / per_process_memory_mb).floor
# CPU-constrained worker count (account for blocking I/O)
cpu_limited = (@available_cpu_cores * 1.5).ceil
# Take the more conservative limit
worker_count = [memory_limited, cpu_limited].min
{
workers: worker_count,
threads: threads_per_worker,
total_concurrency: worker_count * threads_per_worker,
memory_usage_mb: worker_count * per_process_memory_mb,
reasoning: memory_limited < cpu_limited ? :memory_bound : :cpu_bound
}
end
end
planner = PumaCapacityPlanner.new(
available_memory_mb: 16_000,
available_cpu_cores: 8
)
config = planner.recommended_workers(
per_process_memory_mb: 512,
threads_per_worker: 5
)
# => {
# workers: 23,
# threads: 5,
# total_concurrency: 115,
# memory_usage_mb: 11776,
# reasoning: :memory_bound
# }
Memory Profiling identifies per-request memory allocation and retention. Ruby's garbage collector reclaims memory, but retained objects accumulate across requests. Memory profiling tools like memory_profiler and derailed_benchmarks measure allocation patterns.
require 'memory_profiler'
class EndpointCapacityAnalyzer
def analyze_endpoint(controller, action)
report = MemoryProfiler.report do
10.times { simulate_request(controller, action) }
end
{
total_allocated_mb: report.total_allocated_memsize / 1024.0 / 1024.0,
total_retained_mb: report.total_retained_memsize / 1024.0 / 1024.0,
allocated_objects: report.total_allocated,
retained_objects: report.total_retained,
avg_per_request_mb: (report.total_allocated_memsize / 10.0) / 1024.0 / 1024.0
}
end
def project_memory_requirements(metrics, requests_per_second, workers)
per_request_mb = metrics[:avg_per_request_mb]
baseline_memory_mb = 256 # Base process memory
# Assume each worker handles requests_per_second / workers requests
requests_per_worker = requests_per_second.to_f / workers
# Memory accumulates during request lifetime (assume 100ms avg)
concurrent_requests_per_worker = requests_per_worker * 0.1
concurrent_memory = concurrent_requests_per_worker * per_request_mb
total_per_worker = baseline_memory_mb + concurrent_memory
total_system_memory = total_per_worker * workers
{
per_worker_mb: total_per_worker.ceil,
total_system_mb: total_system_memory.ceil,
concurrent_requests_per_worker: concurrent_requests_per_worker.ceil
}
end
end
Database Connection Pooling limits concurrent database connections per process. Connection pool size affects request concurrency and database capacity. Under-sized pools cause request queueing. Over-sized pools exhaust database connection limits.
# Calculate database connection requirements
class DatabaseCapacityPlanner
def initialize(db_max_connections:, app_server_count:)
@db_max_connections = db_max_connections
@app_server_count = app_server_count
end
def calculate_pool_size(workers_per_server:, threads_per_worker:)
# Total potential connections across all servers
total_threads = @app_server_count * workers_per_server * threads_per_worker
# Reserve connections for background jobs, admin, monitoring
reserved_connections = 20
available_for_app = @db_max_connections - reserved_connections
# Calculate pool size per worker
if total_threads <= available_for_app
# Plenty of connections available
recommended_pool_size = threads_per_worker
else
# Need to limit pool size
connections_per_server = available_for_app / @app_server_count
connections_per_worker = connections_per_server / workers_per_server
recommended_pool_size = [connections_per_worker, threads_per_worker].min
end
{
recommended_pool_size: recommended_pool_size,
total_app_connections: recommended_pool_size * workers_per_server * @app_server_count,
utilization_percent: ((recommended_pool_size * workers_per_server * @app_server_count * 100.0) / available_for_app).round(1),
constraint: recommended_pool_size < threads_per_worker ? :connection_limited : :thread_matched
}
end
end
Background Job Capacity requires separate capacity planning from web request handling. Job processors like Sidekiq use threading for concurrency. Job types have varying resource profiles. Long-running jobs consume workers for extended periods, reducing available capacity for other jobs.
class SidekiqCapacityPlanner
def initialize(concurrency:, avg_memory_per_thread_mb:)
@concurrency = concurrency
@avg_memory_per_thread = avg_memory_per_thread_mb
end
def max_throughput(job_mix)
# Calculate weighted average job duration
total_weight = job_mix.sum { |job| job[:frequency] }
avg_duration = job_mix.sum { |job|
(job[:avg_duration_seconds] * job[:frequency]) / total_weight
}
# Throughput = concurrency / avg_duration
jobs_per_second = @concurrency.to_f / avg_duration
{
jobs_per_second: jobs_per_second.round(2),
jobs_per_minute: (jobs_per_second * 60).round(0),
jobs_per_hour: (jobs_per_second * 3600).round(0),
avg_duration_seconds: avg_duration.round(2)
}
end
def required_concurrency(target_jobs_per_hour, job_mix)
jobs_per_second = target_jobs_per_hour / 3600.0
total_weight = job_mix.sum { |job| job[:frequency] }
avg_duration = job_mix.sum { |job|
(job[:avg_duration_seconds] * job[:frequency]) / total_weight
}
required = (jobs_per_second * avg_duration).ceil
{
required_concurrency: required,
memory_estimate_mb: (required * @avg_memory_per_thread).ceil
}
end
end
Tools & Ecosystem
Capacity planning relies on monitoring infrastructure, metric storage, and analysis tools. Ruby applications integrate with various monitoring platforms for metric collection and visualization.
Application Performance Monitoring platforms like New Relic, Datadog, and Scout provide pre-built capacity analytics. These services collect application metrics, infrastructure metrics, and distributed traces. Built-in dashboards track resource utilization trends and forecast capacity needs.
Ruby instrumentation occurs through agent gems that hook into web frameworks and libraries. Agents report metrics to external platforms for aggregation and analysis. APM platforms provide alerting, anomaly detection, and capacity trend visualization.
Metrics Libraries export custom application metrics to time-series databases. The prometheus-client gem exposes metrics in Prometheus format. The statsd-instrument gem sends metrics to StatsD-compatible collectors. Custom metrics supplement infrastructure metrics with application-specific capacity indicators.
require 'prometheus/client'
class CapacityMetricsExporter
def initialize
@registry = Prometheus::Client.registry
@cpu_usage = @registry.gauge(
:app_cpu_utilization_percent,
docstring: 'Current CPU utilization percentage'
)
@memory_usage = @registry.gauge(
:app_memory_usage_bytes,
docstring: 'Current memory usage in bytes'
)
@active_connections = @registry.gauge(
:app_active_database_connections,
docstring: 'Number of active database connections'
)
@request_duration = @registry.histogram(
:app_request_duration_seconds,
docstring: 'Request processing duration',
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
)
end
def record_request(duration_seconds)
@request_duration.observe(duration_seconds)
end
def update_resource_usage
process_info = process_metrics
@cpu_usage.set(process_info[:cpu_percent])
@memory_usage.set(process_info[:memory_bytes])
@active_connections.set(ActiveRecord::Base.connection_pool.connections.count)
end
private
def process_metrics
# Platform-specific process metrics
# Simplified for demonstration
{
cpu_percent: 45.2,
memory_bytes: 512 * 1024 * 1024
}
end
end
Load Testing Tools validate capacity plans through controlled load generation. Apache JMeter, Gatling, and k6 generate traffic patterns for capacity verification. Ruby-specific tools like wrk and siege test HTTP endpoints.
Load testing confirms capacity models by measuring actual resource consumption under various load levels. Tests should ramp load gradually to identify breaking points and characterize resource consumption curves. Testing at multiple load levels validates model accuracy across the capacity range.
Time-Series Databases store capacity metrics for historical analysis and forecasting. Prometheus, InfluxDB, and TimescaleDB optimize for high-cardinality time-series data. These databases support retention policies, downsampling, and efficient range queries necessary for capacity analysis.
require 'influxdb'
class CapacityMetricsStore
def initialize(host:, database:)
@client = InfluxDB::Client.new(
host: host,
database: database,
time_precision: 's'
)
end
def write_capacity_snapshot(tags:, values:)
@client.write_point(
'capacity_metrics',
tags: tags,
values: values,
timestamp: Time.now.to_i
)
end
def query_historical_usage(start_time:, end_time:, metric:)
query = <<~INFLUXQL
SELECT mean(#{metric}) as avg_value, max(#{metric}) as max_value
FROM capacity_metrics
WHERE time >= '#{start_time.iso8601}' AND time <= '#{end_time.iso8601}'
GROUP BY time(1h)
INFLUXQL
result = @client.query(query)
result[0]['values'].map { |point| point.symbolize_keys }
end
def calculate_growth_rate(metric:, days:)
end_time = Time.now
start_time = end_time - (days * 24 * 3600)
query = <<~INFLUXQL
SELECT mean(#{metric}) as value
FROM capacity_metrics
WHERE time >= '#{start_time.iso8601}' AND time <= '#{end_time.iso8601}'
GROUP BY time(1d)
INFLUXQL
result = @client.query(query)
values = result[0]['values'].map { |p| p['value'] }
return 0 if values.size < 2
start_value = values.first
end_value = values.last
((end_value - start_value) / start_value * 100).round(2)
end
end
Profiling Tools identify code-level resource consumption patterns. The ruby-prof gem profiles CPU time and memory allocation. Rack-mini-profiler adds profiling to web requests. Stackprof provides statistical CPU profiling with minimal overhead.
Profiling results inform capacity planning by revealing resource-intensive code paths. Optimization efforts target high-resource operations identified through profiling. Regular profiling detects performance regressions that affect capacity requirements.
Real-World Applications
Production capacity planning scenarios demonstrate the practical application of capacity planning principles across different system architectures and business contexts.
E-Commerce Platform Scaling requires capacity planning for traffic spikes during sales events. A typical e-commerce site handles 1000 requests per second during normal operation but faces 10x traffic during flash sales. Capacity planning accounts for this variance while controlling infrastructure costs.
class EcommerceCapacityPlanner
def initialize(baseline_rps:, baseline_instances:)
@baseline_rps = baseline_rps
@baseline_instances = baseline_instances
end
def plan_for_sale_event(expected_traffic_multiplier:, duration_hours:)
expected_rps = @baseline_rps * expected_traffic_multiplier
# Account for uneven traffic distribution
peak_rps = expected_rps * 1.3
# Calculate required instances (with 30% headroom)
required_capacity = peak_rps / (@baseline_rps / @baseline_instances)
required_instances = (required_capacity * 1.3).ceil
# Database connection planning
threads_per_instance = 5
db_connections = required_instances * threads_per_instance
# Cache requirements (assuming 80% cache hit rate during sale)
cache_hit_rate = 0.8
database_rps = peak_rps * (1 - cache_hit_rate)
{
event_duration_hours: duration_hours,
baseline_rps: @baseline_rps,
expected_peak_rps: peak_rps.round(0),
recommended_instances: required_instances,
database_connections_needed: db_connections,
estimated_database_rps: database_rps.round(0),
cost_multiplier: (required_instances.to_f / @baseline_instances).round(1)
}
end
def pre_warm_caches(product_ids)
# Pre-load hot products into cache before event
product_ids.each_slice(100) do |batch|
Rails.cache.fetch_multi(*batch.map { |id| "product_#{id}" }) do |id|
Product.find(id.split('_').last)
end
end
end
end
Multi-Tenant SaaS Capacity Allocation distributes resources across customer organizations. Large customers consume more resources than small customers. Capacity planning balances resource allocation to prevent any single tenant from affecting others while maintaining cost efficiency.
Resource quotas enforce capacity limits per tenant. CPU shares, memory limits, and rate limiting prevent resource monopolization. Capacity planning determines quota levels based on pricing tiers and resource costs.
Background Processing Pipeline Capacity scales job processing capacity to meet SLA requirements. A video encoding service might need to process 10,000 videos daily with 4-hour maximum processing time. Capacity planning determines required worker count and instance specifications.
class VideoProcessingCapacityPlanner
def initialize(encoding_profiles)
@profiles = encoding_profiles
end
def calculate_required_capacity(daily_video_count:, max_sla_hours:)
# Calculate weighted average encoding time
total_weight = @profiles.sum { |p| p[:percentage] }
avg_encoding_minutes = @profiles.sum do |profile|
(profile[:avg_minutes] * profile[:percentage]) / total_weight
end
# Total encoding time needed per day
total_minutes_needed = daily_video_count * avg_encoding_minutes
# Available processing time per worker (accounting for 80% utilization)
available_minutes_per_worker = max_sla_hours * 60 * 0.8
# Required workers
required_workers = (total_minutes_needed / available_minutes_per_worker).ceil
# Memory requirements (video encoding is memory-intensive)
memory_per_worker_gb = 8
total_memory_gb = required_workers * memory_per_worker_gb
# Storage requirements for queue
avg_video_size_gb = 2
peak_queue_size = daily_video_count * 0.3 # Assume 30% arrive during peak hours
required_storage_gb = (peak_queue_size * avg_video_size_gb * 1.5).ceil
{
daily_video_count: daily_video_count,
avg_encoding_minutes: avg_encoding_minutes.round(1),
required_workers: required_workers,
total_memory_gb: total_memory_gb,
required_storage_gb: required_storage_gb,
max_throughput_per_day: (required_workers * available_minutes_per_worker / avg_encoding_minutes).round(0)
}
end
end
Database Capacity Planning prevents database bottlenecks as application traffic grows. Database capacity constraints include connection limits, CPU capacity, memory for buffer pools, and storage I/O throughput. Capacity planning models database growth and determines appropriate instance sizing.
Read replicas distribute read load across multiple database instances. Write capacity remains constrained by the primary database. Capacity planning accounts for read/write ratios and determines replica count requirements.
Geographic Distribution Planning allocates capacity across multiple regions for latency and availability. Multi-region deployment requires capacity planning for each region while accounting for traffic distribution, failover scenarios, and data replication overhead.
Reference
Core Capacity Metrics
| Metric | Description | Typical Threshold |
|---|---|---|
| CPU Utilization | Percentage of CPU capacity consumed | 70-80% sustained |
| Memory Usage | RAM consumption as percentage of total | 85% sustained |
| Disk I/O | Read/write operations per second | 80% of rated IOPS |
| Network Bandwidth | Data transfer rate as percentage of link capacity | 70% sustained |
| Connection Pool | Active database connections vs pool size | 80% of pool size |
| Queue Depth | Pending items in processing queues | Varies by SLA |
| Error Rate | Failed requests as percentage of total | >1% of requests |
| Response Time | Request processing latency | P95 >500ms |
Capacity Planning Formulas
| Calculation | Formula | Use Case |
|---|---|---|
| Required Instances | (Target RPS × Avg Response Time) ÷ (Instance RPS × Target Utilization) | Web server capacity |
| Database Connections | (App Instances × Workers per Instance × Threads per Worker) + Reserved | Connection pool sizing |
| Memory Requirement | (Worker Count × Base Memory) + (Concurrent Requests × Request Memory) | Process memory planning |
| Storage Growth | Current Size × (1 + Daily Growth Rate) ^ Days | Storage provisioning |
| Queue Processing Time | (Queue Depth × Avg Processing Time) ÷ Worker Count | SLA validation |
Ruby Process Sizing Guidelines
| Server Type | Memory per Worker | CPU per Worker | Threads per Worker |
|---|---|---|---|
| Puma (small app) | 256-512 MB | 0.5-1 core | 3-5 |
| Puma (large app) | 512-1024 MB | 1-2 cores | 5-10 |
| Unicorn | 512-1024 MB | 1 core | 1 |
| Sidekiq | 256-512 MB | 0.5-1 core | 10-25 |
| Passenger | 256-512 MB | 0.5-1 core | 1 |
Headroom Recommendations
| Traffic Pattern | Recommended Headroom | Rationale |
|---|---|---|
| Stable, predictable | 20-30% | Handles minor spikes |
| Variable, seasonal | 30-50% | Accommodates peaks |
| Unpredictable, viral | 50-100% | Rapid growth protection |
| Auto-scaling enabled | 10-20% | System scales automatically |
| Manual scaling only | 40-60% | Buffer for provisioning delay |
Monitoring Collection Intervals
| Metric Type | Collection Interval | Retention Period |
|---|---|---|
| Infrastructure (CPU, memory) | 10-60 seconds | 1-3 months full, 1 year aggregated |
| Application (request rate) | 10-60 seconds | 1-3 months full, 1 year aggregated |
| Database (queries, connections) | 10-60 seconds | 1-3 months full, 1 year aggregated |
| Business (signups, purchases) | 1-5 minutes | 2 years full |
| Capacity snapshots | 1-24 hours | 3-5 years |
Scaling Decision Matrix
| Current Utilization | Growth Rate | Action Required | Timeline |
|---|---|---|---|
| <50% | <5% monthly | Monitor | Quarterly review |
| 50-70% | <5% monthly | Plan | 2-3 month horizon |
| 50-70% | 5-10% monthly | Plan | 1-2 month horizon |
| 70-85% | Any | Scale soon | 2-4 weeks |
| >85% | Any | Scale immediately | 1 week |
| >95% | Any | Emergency scaling | 24-48 hours |