CrackedRuby - Cost Optimization

Overview

Cost optimization in software development focuses on reducing infrastructure and operational expenses while maintaining or improving system performance and reliability. The practice emerged as cloud computing shifted infrastructure from capital expenditure to operational expenditure, making cost management a continuous operational concern rather than a one-time purchasing decision.

Modern applications run on infrastructure where every CPU cycle, memory allocation, network transfer, and storage operation incurs measurable cost. A database query that takes 100ms instead of 10ms costs more in compute resources. An API that returns 10MB of data instead of 1MB costs more in network transfer. An application that keeps 50GB in memory instead of 5GB costs more in hosting fees. These differences compound across millions of requests per day.

Cost optimization operates across multiple dimensions. Infrastructure costs include compute instances, databases, storage systems, load balancers, and network traffic. Operational costs include monitoring services, logging platforms, security tools, and third-party APIs. Development costs include CI/CD pipeline usage, development environments, and testing infrastructure. Each dimension requires different optimization strategies and measurement approaches.

The discipline differs from simple cost cutting. Cost optimization maintains or improves system capabilities while reducing expenses. An optimized system might actually increase spending in one area (caching infrastructure) to reduce costs in another (database queries). The goal focuses on maximum value per dollar spent, not minimum absolute spending.

# Unoptimized API call with excessive data transfer
def get_user_profile(user_id)
  user = User.find(user_id)
  # Returns entire user object with all associations loaded
  user.to_json(include: [:posts, :comments, :followers, :following])
end

# Optimized API call returning only required fields
def get_user_profile(user_id)
  User.select(:id, :name, :email, :avatar_url)
      .find(user_id)
      .to_json
end
# Reduces data transfer from ~50KB to ~500 bytes per request

Key Principles

Cost optimization follows several fundamental principles that guide decision-making and implementation strategies.

Resource right-sizing matches infrastructure capacity to actual workload requirements. Systems frequently over-provision resources as a safety margin, running applications on instances far larger than necessary. A web application that uses 20% CPU and 30% memory on an 8-core, 32GB instance wastes resources. Right-sizing moves the application to a 2-core, 8GB instance, reducing costs by 75% while maintaining performance headroom.

Usage-based scaling adjusts resources based on demand patterns. Most applications experience variable load throughout the day, week, or season. An e-commerce site might handle 1,000 requests per minute during business hours but only 100 requests per minute overnight. Running the same infrastructure 24/7 wastes money during low-traffic periods. Scaling resources up during peak demand and down during quiet periods reduces average resource consumption.

Efficient data storage minimizes storage costs through compression, lifecycle management, and appropriate storage tier selection. Data has different access patterns and retention requirements. Application logs older than 30 days might be needed for compliance but rarely accessed. Moving this data from high-performance SSD storage to cheaper archival storage reduces costs without impacting operations.

Caching strategies reduce repeated computation and data retrieval costs. Every database query, API call, or computation consumes resources. When the same data is requested multiple times, caching the result eliminates redundant work. A product catalog that changes daily but is queried thousands of times per hour benefits from caching the query results in memory.

Asynchronous processing moves work outside the request-response cycle, reducing the need for always-available resources. Immediate response requirements force systems to maintain capacity for peak loads. Tasks that tolerate delays can run during off-peak hours on cheaper spot instances or shared resources. Report generation, data aggregation, and batch processing often fit this pattern.

Reserved capacity planning commits to predictable resource usage in exchange for significant discounts. Cloud providers offer 40-70% discounts for 1-3 year resource commitments. Applications with stable baseline workloads benefit from reserving that baseline capacity at discounted rates while using on-demand resources for variable load.

Monitoring and attribution tracks costs to specific features, teams, or customers. Without visibility into where money is spent, optimization efforts target guesses rather than data. Cost allocation reveals that a minor feature consumes 30% of database resources or that one large customer generates 50% of API costs. This information drives prioritization and pricing decisions.

# Demonstrating caching to reduce database costs
class ProductCatalog
  def self.featured_products
    # Cache key includes timestamp to auto-expire daily
    cache_key = "featured_products:#{Date.today}"
    
    Rails.cache.fetch(cache_key, expires_in: 24.hours) do
      # Expensive database query runs once per day instead of per request
      Product.includes(:images, :reviews)
             .where(featured: true)
             .order(popularity: :desc)
             .limit(20)
             .to_a
    end
  end
end
# First request: database query takes 200ms, costs compute + database time
# Subsequent requests: cache hit takes 2ms, costs only memory access

Implementation Approaches

Cost optimization implementations follow distinct strategies based on system architecture, traffic patterns, and business requirements.

Reactive optimization responds to cost spikes and inefficiencies after they occur. Teams monitor spending patterns, identify anomalies, and investigate root causes. A sudden increase in database costs triggers analysis revealing an unindexed query added in the latest deployment. This approach works well for unpredictable cost issues but requires strong monitoring and rapid response capabilities.

Proactive optimization embeds cost awareness into design and development processes. Architecture reviews include cost analysis. Code reviews examine query efficiency. Deployment pipelines reject changes that exceed performance budgets. This approach prevents cost problems rather than fixing them after deployment but requires more upfront investment in tooling and training.

Automated optimization uses systems that adjust resources without manual intervention. Auto-scaling groups add instances when CPU exceeds 70% and remove them when it drops below 30%. Database query analyzers identify slow queries and suggest indexes. Kubernetes horizontal pod autoscalers adjust replica counts based on metrics. Automation handles routine optimization decisions, freeing engineers for complex optimizations.

Scheduled optimization aligns resource allocation with known demand patterns. Business hours schedules scale up development environments at 8 AM and down at 6 PM. Weekend schedules reduce production capacity for B2B applications. Seasonal schedules prepare for holiday traffic months in advance. This approach works when demand follows predictable patterns.

Architectural optimization restructures systems to fundamentally reduce resource requirements. Replacing synchronous API calls with event-driven architecture. Moving from monolithic databases to distributed caches. Implementing edge computing to reduce origin server load. These changes require significant engineering effort but can produce step-function cost reductions.

# Scheduled optimization for development environments
class EnvironmentScheduler
  def self.apply_business_hours_schedule
    schedule = {
      weekday_start: '08:00',
      weekday_end: '18:00',
      weekend: 'stopped'
    }
    
    current_time = Time.current
    
    if current_time.saturday? || current_time.sunday?
      stop_environment('development')
    elsif current_time.hour < 8 || current_time.hour >= 18
      stop_environment('development')
    else
      start_environment('development')
    end
  end
  
  def self.stop_environment(env_name)
    # Stops EC2 instances, RDS databases for development
    # Saves ~70% of costs during off-hours
    AWS::EnvironmentManager.stop(env_name)
  end
  
  def self.start_environment(env_name)
    AWS::EnvironmentManager.start(env_name)
  end
end

The optimal implementation approach combines multiple strategies. Production systems use automated scaling for predictable load patterns while maintaining manual oversight for cost anomalies. Development environments follow scheduled optimization. New features undergo architectural review for cost implications. The combination provides defense in depth against cost overruns.

Ruby Implementation

Ruby applications running on cloud infrastructure have specific optimization opportunities through language features, frameworks, and ecosystem tools.

Database query optimization represents the most common cost reduction opportunity in Ruby applications. ActiveRecord makes database access convenient but can generate inefficient queries. The N+1 query problem occurs when code loads a collection then accesses an association for each record, executing hundreds of queries instead of one.

# N+1 query problem - extremely inefficient
def show_user_posts
  @user = User.find(params[:id])
  # This generates one query per post to load comments
  @posts = @user.posts.map do |post|
    {
      title: post.title,
      comment_count: post.comments.count  # Separate query per post!
    }
  end
end
# 1 query for user + 1 query per post for comments = 1 + N queries

# Optimized version using eager loading and counter cache
def show_user_posts
  @user = User.find(params[:id])
  # Single query with join and count
  @posts = @user.posts
                .select('posts.*, COUNT(comments.id) as comment_count')
                .joins(:comments)
                .group('posts.id')
end
# 2 total queries regardless of post count

Background job processing moves expensive operations outside the request-response cycle, allowing web servers to handle more requests per instance. Sidekiq, Resque, and Delayed Job provide Ruby implementations of background processing. Jobs run on separate workers that can use smaller, cheaper instances than web servers.

# Expensive operation in request - blocks web worker
class ReportsController < ApplicationController
  def generate
    report = GenerateMonthlyReport.new(params[:month])
    result = report.execute  # Takes 30-60 seconds
    render json: result
  end
end
# Web worker tied up for 30-60 seconds per report
# Requires large instances to handle concurrent requests

# Moved to background job - immediate response
class ReportsController < ApplicationController
  def generate
    ReportGenerationJob.perform_later(params[:month], current_user.id)
    render json: { status: 'queued', check_status_url: status_path }
  end
end

class ReportGenerationJob < ApplicationJob
  queue_as :reports
  
  def perform(month, user_id)
    report = GenerateMonthlyReport.new(month)
    result = report.execute
    # Store result, send email notification
    ReportMailer.completed(user_id, result).deliver_now
  end
end
# Web workers respond immediately
# Background workers can run on spot instances at 70% discount

Memory management impacts costs directly since memory is a primary pricing factor for cloud instances. Ruby's garbage collector manages memory automatically, but application code controls memory allocation patterns. Large object allocations, memory leaks, and retained references increase memory footprint.

# Memory-inefficient CSV processing
def process_large_csv(file_path)
  # Loads entire file into memory at once
  csv_data = CSV.read(file_path)
  csv_data.each do |row|
    process_row(row)
  end
end
# 1GB CSV file requires 1GB+ of memory

# Memory-efficient streaming approach
def process_large_csv(file_path)
  # Processes one row at a time
  CSV.foreach(file_path) do |row|
    process_row(row)
  end
end
# Same 1GB CSV uses ~1MB of memory
# Allows using smaller instances with less memory

Connection pooling reduces database connection costs by reusing connections across requests. Each database connection consumes memory on both application and database servers. Opening a new connection for each request wastes resources. Connection pools maintain a fixed number of connections shared across requests.

# Database connection pool configuration
# config/database.yml
production:
  adapter: postgresql
  pool: <%= ENV.fetch("DB_POOL_SIZE", 5) %>
  timeout: 5000
  checkout_timeout: 5

# Application balances connection count with concurrency needs
# 20 web workers × 5 connections = 100 database connections
# vs. 20 workers × 20 connections = 400 connections
# Reduces database instance size requirements by 75%

Caching with Rails.cache reduces computational costs by storing expensive operation results. Rails provides a unified caching interface supporting memory stores, Redis, Memcached, and other backends.

class ProductsController < ApplicationController
  def index
    @products = Rails.cache.fetch('products/all', expires_in: 1.hour) do
      # Expensive database query with multiple joins
      Product.includes(:manufacturer, :reviews, :images)
             .where(active: true)
             .order(created_at: :desc)
             .to_a
    end
  end
  
  # Fragment caching for expensive view rendering
  def show
    @product = Product.find(params[:id])
    # Cached in view template:
    # <%= cache @product do %>
    #   <%= render @product %>
    # <% end %>
  end
end

Tools & Ecosystem

The Ruby ecosystem includes gems and services specifically designed for cost monitoring and optimization.

aws-sdk-ruby provides programmatic access to AWS services for cost management. The gem supports querying Cost Explorer, managing Reserved Instances, and controlling resource lifecycle.

require 'aws-sdk-costexplorer'

class CostAnalyzer
  def initialize
    @client = Aws::CostExplorer::Client.new(region: 'us-east-1')
  end
  
  def monthly_costs_by_service
    resp = @client.get_cost_and_usage({
      time_period: {
        start: Date.today.beginning_of_month.to_s,
        end: Date.today.to_s
      },
      granularity: 'MONTHLY',
      metrics: ['UnblendedCost'],
      group_by: [{
        type: 'DIMENSION',
        key: 'SERVICE'
      }]
    })
    
    resp.results_by_time.first.groups.map do |group|
      {
        service: group.keys.first,
        cost: group.metrics['UnblendedCost'].amount.to_f
      }
    end.sort_by { |item| -item[:cost] }
  end
end

# Returns: [
#   { service: 'Amazon RDS', cost: 1245.67 },
#   { service: 'Amazon EC2', cost: 892.34 },
#   ...
# ]

rack-mini-profiler identifies performance bottlenecks in Ruby web applications. The gem displays detailed request breakdowns showing database query times, view rendering costs, and memory allocations. Performance problems directly correlate with infrastructure costs.

bullet detects N+1 queries and unused eager loading in Rails applications. The gem monitors ActiveRecord queries during development and test runs, alerting developers to inefficient database access patterns before they reach production.

derailed_benchmarks provides tools for identifying memory bloat and performance issues in Rails applications. The gem includes commands for memory profiling, allocation tracking, and request benchmarking.

# Gemfile
gem 'derailed_benchmarks', group: :development

# Run memory profiling
# $ bundle exec derailed bundle:mem
# Shows memory usage of each gem at boot time

# Profile a specific endpoint
# $ PATH_TO_HIT=/products/1 bundle exec derailed exec perf:mem
# Reveals memory allocations during request processing

prometheus-client enables metric collection for cost attribution and optimization. The gem exports custom metrics to Prometheus, allowing teams to correlate application behavior with infrastructure costs.

require 'prometheus/client'

class ApplicationMetrics
  def self.prometheus
    @prometheus ||= Prometheus::Client.registry
  end
  
  def self.setup
    @api_call_counter = prometheus.counter(
      :api_calls_total,
      docstring: 'Total API calls by endpoint',
      labels: [:endpoint, :customer_id]
    )
    
    @database_query_duration = prometheus.histogram(
      :database_query_duration_seconds,
      docstring: 'Database query execution time',
      labels: [:query_type]
    )
  end
  
  def self.record_api_call(endpoint, customer_id)
    @api_call_counter.increment(
      labels: { endpoint: endpoint, customer_id: customer_id }
    )
  end
end

# Correlate high-cost customers with API usage patterns
# Identify endpoints driving database costs

Scout APM, New Relic, and Datadog provide commercial application performance monitoring with cost analysis features. These services identify slow transactions, expensive database queries, and memory leaks while correlating performance data with infrastructure costs.

aws-sdk-ec2 and aws-sdk-autoscaling enable programmatic instance management. Applications can implement custom scaling logic, start/stop schedules, and spot instance bidding strategies.

Performance Considerations

Performance optimization and cost optimization form two sides of the same coin. Improving performance reduces resource consumption, which directly reduces costs.

Database query performance has the highest impact on infrastructure costs for most Ruby applications. Inefficient queries force database instances to work harder, require more memory and CPU, and necessitate larger instance types. A query taking 5 seconds instead of 50ms requires 100x more database resources per execution.

Index optimization reduces query execution time dramatically. A table scan on 10 million rows takes seconds. An index lookup on the same table takes milliseconds. The difference compounds across thousands of queries per minute.

# Unindexed query - table scan on large table
class OrdersController < ApplicationController
  def recent_by_customer
    # Full table scan on orders table
    @orders = Order.where(customer_email: params[:email])
                   .order(created_at: :desc)
                   .limit(10)
  end
end
# Query time: 4,500ms on 5M rows
# Database CPU: 95%
# Requires db.m5.2xlarge instance ($560/month)

# After adding index on customer_email
# add_index :orders, :customer_email
# Query time: 12ms
# Database CPU: 25%
# Can downgrade to db.m5.large ($140/month)
# Savings: $420/month

Memory allocation patterns determine instance size requirements. Ruby allocates objects frequently. Applications that allocate excessive objects or retain references unnecessarily require more memory, forcing larger instance types.

Object pooling reuses objects instead of allocating new ones. String allocation, in particular, can be reduced through frozen string literals and string caching.

# High memory allocation - creates new strings repeatedly
def format_prices(products)
  products.map do |product|
    "$#{product.price.round(2)}"  # New string every iteration
  end
end
# 10,000 products = 10,000 string allocations

# Reduced allocation using memoization
def format_prices(products)
  products.map do |product|
    format_price(product.price)
  end
end

def format_price(price)
  # Cache formatted strings for common prices
  @price_cache ||= Hash.new do |h, k|
    h[k] = "$#{k.round(2)}".freeze
  end
  @price_cache[price]
end
# Reuses cached strings, reduces allocations by 90%+

Network transfer costs scale with response payload size. APIs that return excessive data waste bandwidth and increase transfer fees. Compressed responses reduce transfer by 70-90%.

class ApplicationController < ActionController::API
  # Enable response compression
  before_action :set_compression
  
  private
  
  def set_compression
    request.env['HTTP_ACCEPT_ENCODING'] = 'gzip'
  end
end

# Rack::Deflater middleware handles compression
# config/application.rb
config.middleware.use Rack::Deflater
# 100KB JSON response compresses to 10KB
# 90% reduction in transfer costs

Background job efficiency impacts worker instance requirements. Jobs that process data inefficiently run longer, requiring more workers to maintain throughput. Optimizing job performance reduces worker count.

# Inefficient job - processes one record at a time
class EmailNotificationJob < ApplicationJob
  def perform(user_id)
    user = User.find(user_id)
    user.notifications.each do |notification|
      NotificationMailer.send_notification(notification).deliver_now
    end
  end
end
# 1000 users with 10 notifications each = 10,000 jobs
# Each job takes 5 seconds = 50,000 seconds total
# Requires 14 workers to complete in 1 hour

# Optimized job - batches email sending
class BatchEmailNotificationJob < ApplicationJob
  def perform(user_ids)
    users = User.where(id: user_ids).includes(:notifications)
    
    emails = users.flat_map do |user|
      user.notifications.map do |notification|
        NotificationMailer.send_notification(notification)
      end
    end
    
    # Batch delivery reduces overhead
    Mail.deliver_many(emails)
  end
end
# 1000 users in 10 batches of 100 = 10 jobs
# Each job takes 15 seconds = 150 seconds total
# Requires 1 worker to complete in 3 minutes

Real-World Applications

Production systems implement cost optimization through layered strategies addressing different cost drivers.

Multi-tier caching architecture reduces database load by serving requests from progressively cheaper storage layers. A content platform implements browser caching, CDN caching, application caching, and database query caching.

class ArticlesController < ApplicationController
  def show
    @article = find_article(params[:id])
    
    # Browser cache: 5 minutes for logged-out users
    expires_in 5.minutes, public: true unless user_signed_in?
  end
  
  private
  
  def find_article(article_id)
    # Application cache: 1 hour
    Rails.cache.fetch("article:#{article_id}", expires_in: 1.hour) do
      # Database with query result caching
      Article.find(article_id)
    end
  end
end

# CDN caching handled at infrastructure layer
# Traffic pattern: 10,000 requests/minute for popular articles
# Without caching: 10,000 database queries/minute
# With caching: ~3 database queries/minute (on cache misses)
# Database cost reduction: 99.97%

Spot instance integration for batch processing workloads reduces compute costs by 70-90%. A data processing pipeline uses spot instances for non-time-critical jobs with checkpointing for interruption handling.

class SpotInstanceProcessor
  def process_batch(data_batch)
    # Process in chunks with progress tracking
    data_batch.each_slice(100).with_index do |chunk, index|
      process_chunk(chunk)
      save_checkpoint(index)
      
      # Check for spot instance interruption warning
      if spot_instance_terminating?
        save_state_and_exit
        return
      end
    end
  end
  
  def spot_instance_terminating?
    # AWS provides 2-minute warning via metadata service
    uri = URI('http://169.254.169.254/latest/meta-data/spot/instance-action')
    response = Net::HTTP.get_response(uri)
    response.code != '404'
  rescue StandardError
    false
  end
  
  def save_checkpoint(index)
    Redis.current.set("batch_progress:#{job_id}", index)
  end
  
  def resume_from_checkpoint
    last_index = Redis.current.get("batch_progress:#{job_id}").to_i
    # Continue from last checkpoint when instance resumes
  end
end

# Cost comparison for processing 100M records:
# On-demand instances: $450/month
# Spot instances: $90/month (80% savings)

Serverless architecture for event-driven workloads eliminates idle resource costs. An image processing service uses AWS Lambda invoked by S3 uploads, paying only for actual processing time.

# AWS Lambda function handler
require 'aws-sdk-s3'
require 'mini_magick'

def lambda_handler(event:, context:)
  s3 = Aws::S3::Client.new
  
  event['Records'].each do |record|
    bucket = record['s3']['bucket']['name']
    key = record['s3']['object']['key']
    
    # Download original image
    obj = s3.get_object(bucket: bucket, key: key)
    
    # Process image
    image = MiniMagick::Image.read(obj.body)
    image.resize '800x600'
    image.quality '85'
    
    # Upload processed version
    processed_key = key.sub('uploads/', 'processed/')
    s3.put_object(
      bucket: bucket,
      key: processed_key,
      body: image.to_blob
    )
  end
  
  { statusCode: 200, body: 'Processed' }
end

# Cost model:
# Traditional: EC2 instance running 24/7 = $75/month
# Serverless: 10,000 executions at 1s each = $2.50/month
# Savings: 97%

Database read replica scaling distributes read traffic across multiple database instances, allowing the primary database to use a smaller instance type.

# config/database.yml
production:
  primary:
    adapter: postgresql
    host: primary.db.internal
    pool: 5
  
  replica:
    adapter: postgresql
    host: replica.db.internal
    replica: true
    pool: 20

# Application router
class ApplicationRecord < ActiveRecord::Base
  connects_to database: { writing: :primary, reading: :replica }
end

# Queries automatically route to appropriate database
class ProductsController < ApplicationController
  def index
    # Reads from replica
    @products = Product.all
  end
  
  def create
    # Writes to primary
    @product = Product.create(product_params)
  end
end

# Cost impact:
# Single large primary: db.m5.4xlarge = $1,120/month
# Smaller primary + 2 replicas: db.m5.xlarge + 2×db.m5.large = $700/month
# Savings: $420/month (37%)

Reference

Cost Optimization Strategies

Strategy	Description	Typical Savings	Implementation Complexity
Right-sizing	Match instance size to workload	30-50%	Low
Auto-scaling	Adjust capacity based on demand	20-40%	Medium
Reserved instances	Commit to 1-3 year usage	40-70%	Low
Spot instances	Use interruptible instances	70-90%	Medium-High
Serverless	Pay per execution	50-90%	Medium
Caching	Store computed results	60-95%	Medium
Database optimization	Improve query efficiency	40-80%	Medium
Storage tiering	Move data to cheaper storage	60-90%	Low-Medium

Ruby Performance Optimization Techniques

Technique	Impact	Use Case
Eager loading	Eliminates N+1 queries	ActiveRecord associations
Counter caches	Avoids COUNT queries	Association counts
Fragment caching	Reduces view rendering	Expensive partials
Query result caching	Reduces database load	Repeated queries
Background jobs	Frees web workers	Long-running tasks
Connection pooling	Reduces connection overhead	Database connections
Batch processing	Reduces per-item overhead	Bulk operations
Streaming responses	Reduces memory usage	Large datasets

Database Optimization Checklist

Item	Action	Expected Improvement
Indexes	Add indexes for WHERE, JOIN, ORDER BY columns	10-1000x query speedup
N+1 queries	Use includes, eager_load, preload	90%+ query reduction
Query selection	Select only needed columns	50-80% data transfer reduction
Connection pool	Size pool for concurrency needs	50-75% connection reduction
Read replicas	Route reads to replicas	40-60% primary load reduction
Vacuum	Regular VACUUM on PostgreSQL	20-40% space reclamation
Statistics	Update table statistics	10-30% query plan improvement
Partitioning	Partition large tables	50-90% query improvement

AWS Cost Optimization Resources

Resource Type	Discount Method	Savings Range	Best For
EC2 Reserved	1 or 3 year commitment	40-70%	Predictable baseline load
EC2 Spot	Interruptible instances	70-90%	Batch processing, fault-tolerant workloads
RDS Reserved	1 or 3 year commitment	40-65%	Production databases
S3 Intelligent-Tiering	Automatic tier movement	40-95%	Variable access patterns
S3 Glacier	Manual archival	80-95%	Long-term retention
Lambda	Pay per execution	N/A	Event-driven, sporadic workloads
CloudFront	CDN caching	60-90%	Static content delivery
EBS gp3	Latest generation storage	20%	General purpose storage

Common Cost Drivers

Driver	Typical Percentage of Costs	Optimization Priority
Compute instances	35-50%	High
Database services	20-35%	High
Data transfer	10-20%	Medium
Storage	5-15%	Medium
Load balancers	3-8%	Low
Monitoring/logging	2-5%	Low
Backup/disaster recovery	2-5%	Low
Development environments	5-15%	Medium

Monitoring Metrics

Metric	Threshold	Action
CPU utilization	Consistently under 40%	Downsize instance
Memory utilization	Consistently under 50%	Reduce memory allocation
Database connections	Pool exhaustion	Increase pool or find leaks
Cache hit rate	Below 80%	Increase cache size or TTL
Query execution time	Above 100ms average	Add indexes or optimize
Background job queue	Growing backlog	Add workers or optimize jobs
API response time	Above 200ms p95	Profile and optimize
Error rate	Above 1%	Investigate and fix

Cost Optimization