CrackedRuby - Deployment Strategies

Overview

Deployment strategies define how software transitions from development to production environments. Each strategy balances competing concerns: minimizing downtime, reducing deployment risk, enabling fast rollback, and managing infrastructure costs. The choice of deployment strategy affects application availability, operational complexity, and the team's ability to deliver updates safely.

Traditional deployments involved taking systems offline, replacing code, and restarting services. Modern deployment strategies eliminate or minimize downtime by orchestrating multiple instances, routing traffic intelligently, and validating new versions before full rollout. These strategies emerged from the need to deploy frequently while maintaining high availability in production systems.

Deployment strategies operate on several core concepts. Instances are running copies of the application. Traffic routing directs user requests to specific instances. Health checks verify instance readiness. Rollback reverts to the previous version when problems occur. Deployment window is the time period when changes are applied.

# Health check endpoint example
class HealthController < ApplicationController
  def show
    database_healthy = ActiveRecord::Base.connection.active?
    cache_healthy = Rails.cache.redis.ping == "PONG"
    
    status = database_healthy && cache_healthy ? :ok : :service_unavailable
    
    render json: {
      status: status,
      database: database_healthy,
      cache: cache_healthy,
      version: ENV['APP_VERSION']
    }, status: status
  end
end

The deployment strategy determines application behavior during updates. A web application with 1000 requests per second cannot tolerate strategies that cause dropped requests or extended downtime. Different applications require different strategies based on their availability requirements, traffic patterns, and architectural constraints.

Key Principles

Deployment strategies share fundamental principles that govern their operation. Availability measures the percentage of time the application serves requests successfully. Zero-downtime deployment maintains availability during updates by ensuring some instances always serve traffic. Atomicity means deployments either complete fully or roll back entirely, avoiding partial states.

Risk mitigation limits the impact of defective releases. Strategies that expose new code to small traffic percentages detect problems before they affect all users. Rollback capability enables rapid reversion when deployments introduce bugs or performance problems. Fast rollback requires preserving the previous version and maintaining the ability to redirect traffic.

Validation confirms that newly deployed code functions correctly before serving production traffic. Validation includes health checks, smoke tests, and metric monitoring. Health checks verify basic functionality like database connectivity. Smoke tests execute critical paths through the application. Metric monitoring detects anomalies in error rates, response times, or throughput.

# Deployment validation script
class DeploymentValidator
  def initialize(endpoint)
    @endpoint = endpoint
    @http = Net::HTTP.new(URI(@endpoint).host, URI(@endpoint).port)
  end
  
  def validate
    health_check && smoke_tests && metric_checks
  end
  
  def health_check
    response = @http.get('/health')
    response.code == '200' && JSON.parse(response.body)['status'] == 'ok'
  end
  
  def smoke_tests
    critical_endpoints.all? do |path|
      response = @http.get(path)
      (200..299).include?(response.code.to_i)
    end
  end
  
  def metric_checks
    error_rate < 0.01 && p95_latency < 500
  end
end

State management handles application and database state during deployments. Stateless applications simplify deployments because instances can start and stop independently. Stateful applications require coordination to avoid data loss or corruption. Database schema changes must be compatible with both old and new application versions during transitions.

Traffic shaping controls request routing during deployments. Load balancers direct traffic based on instance health, deployment stage, or request characteristics. Traffic shaping enables gradual rollouts where new code serves increasing percentages of requests.

Monitoring and observability provide visibility into deployment progress and application health. Metrics track error rates, latencies, throughput, and resource utilization. Logs capture detailed information about request processing. Distributed tracing shows request flow across services. These signals enable teams to detect problems quickly and make informed rollback decisions.

Implementation Approaches

Recreate Deployment

Recreate deployment stops all running instances, deploys new code, and starts new instances. This strategy causes downtime equal to the stop-deploy-start cycle duration. The simplicity appeals for applications that tolerate downtime or deploy during maintenance windows.

The deployment process terminates all instances simultaneously, updates code on each server, and launches the new version. Load balancers mark the application as unavailable during this window. Users receive error responses until instances restart and pass health checks.

Recreate deployments require minimal infrastructure. The application needs only production instances without spare capacity. No traffic routing logic handles multiple versions simultaneously. Database migrations run before starting new instances, knowing old code no longer executes.

This strategy fits applications with scheduled maintenance windows, internal tools with limited users, or systems where deployment simplicity outweighs availability requirements. It fails for customer-facing services requiring 24/7 availability or applications with long startup times.

Rolling Deployment

Rolling deployment updates instances incrementally, replacing a subset of instances at a time while others continue serving traffic. The deployment proceeds in waves: stop instances, update code, start instances, verify health, then proceed to the next wave. This maintains partial availability throughout the deployment.

The wave size determines deployment characteristics. Small waves (updating one instance at a time) minimize risk by limiting exposure to defects but extend deployment duration. Large waves speed deployment but increase the number of users affected by defects. A common approach updates 25% of instances per wave.

# Rolling deployment orchestration
class RollingDeployment
  def initialize(instances:, wave_size:)
    @instances = instances
    @wave_size = wave_size
  end
  
  def deploy(version)
    waves.each do |wave_instances|
      deploy_wave(wave_instances, version)
      verify_wave(wave_instances) || rollback_wave(wave_instances)
    end
  end
  
  private
  
  def waves
    @instances.each_slice(@wave_size).to_a
  end
  
  def deploy_wave(instances, version)
    instances.each do |instance|
      instance.mark_unhealthy
      instance.deploy(version)
      instance.restart
      wait_for_health(instance)
    end
  end
  
  def verify_wave(instances)
    sleep 60 # Observation period
    instances.all? { |i| i.error_rate < 0.01 && i.healthy? }
  end
end

Rolling deployments reduce capacity during deployment because some instances are offline or starting. If normal capacity is 10 instances and waves update 2 instances, capacity drops to 80% during wave updates. Applications must handle this reduced capacity without degrading performance.

Database compatibility becomes critical. Both old and new code versions run simultaneously during deployment. Schema changes must be backward-compatible, typically requiring multi-phase deployments: add columns in phase one, deploy code using new columns in phase two, remove old columns in phase three.

Blue-Green Deployment

Blue-green deployment maintains two identical production environments. One environment (blue) serves live traffic while the other (green) remains idle. Deployment updates the idle environment, validates it, then switches traffic from blue to green. The previous environment remains available for immediate rollback.

The deployment process deploys new code to the green environment while blue continues serving all traffic. Automated tests and manual validation confirm green functions correctly. Traffic switches from blue to green via load balancer reconfiguration. If problems occur, traffic switches back to blue.

# Blue-green traffic switch
class BlueGreenDeployment
  def initialize(load_balancer:, blue_env:, green_env:)
    @lb = load_balancer
    @blue = blue_env
    @green = green_env
  end
  
  def deploy(version)
    inactive_env = current_env == @blue ? @green : @blue
    
    inactive_env.deploy(version)
    inactive_env.start_all_instances
    
    return false unless validate_environment(inactive_env)
    
    @lb.switch_traffic(from: current_env, to: inactive_env)
    monitor_metrics(inactive_env, duration: 600)
  end
  
  def validate_environment(env)
    env.all_instances_healthy? &&
      run_smoke_tests(env) &&
      performance_acceptable?(env)
  end
  
  def rollback
    @lb.switch_traffic(from: current_env, to: previous_env)
  end
end

Blue-green deployment requires double infrastructure capacity because both environments must handle full production load. This increases costs but provides the fastest rollback capability - merely redirecting traffic. The approach works well with containerized applications where spinning up duplicate environments is automated.

Database handling complicates blue-green deployments. Both environments typically share the same database, requiring schema compatibility between versions. Separate databases per environment enable true isolation but complicate data synchronization and increase storage costs.

Canary Deployment

Canary deployment gradually shifts traffic from the old version to the new version while monitoring metrics for problems. The deployment starts by routing a small traffic percentage (typically 5-10%) to the new version. If metrics remain healthy, traffic increases incrementally until 100% reaches the new version.

Traffic routing uses load balancer rules, service mesh configuration, or application-level routing. Requests can be routed randomly based on percentage, by specific user cohorts, or by request characteristics. Geographic routing sends traffic from one region to the new version while others remain on the old version.

# Canary routing with Rack middleware
class CanaryRouter
  def initialize(app, canary_percentage:)
    @app = app
    @canary_percentage = canary_percentage
  end
  
  def call(env)
    if route_to_canary?
      env['HTTP_X_CANARY_VERSION'] = 'new'
      forward_to_canary_instances(env)
    else
      env['HTTP_X_CANARY_VERSION'] = 'stable'
      @app.call(env)
    end
  end
  
  private
  
  def route_to_canary?
    rand(100) < @canary_percentage
  end
  
  def forward_to_canary_instances(env)
    # Proxy request to canary instance pool
    proxy = Rack::Proxy.new(backend: ENV['CANARY_BACKEND'])
    proxy.call(env)
  end
end

Monitoring during canary deployment compares metrics between canary and stable versions. Key metrics include error rates, response latencies, throughput, and business metrics like conversion rates. Significant deviations trigger automatic rollback or halt traffic increases.

The canary progression follows a schedule: 5% for 10 minutes, 25% for 20 minutes, 50% for 30 minutes, 100%. Schedules balance risk (faster progression exposes more users to defects) against deployment speed (slower progression delays feature delivery). Automated systems adjust progression based on metric health.

Canary deployments excel at detecting problems that only manifest under production load or with real user data. Staging environments cannot replicate production diversity, making canary validation valuable. The approach requires metric infrastructure and automation to compare versions and control traffic routing.

Feature Flag Deployment

Feature flag deployment deploys new code to production with features disabled by default. Flags control whether features activate for specific users, percentages of traffic, or globally. This separates deployment from feature release, enabling testing in production before widespread activation.

Flags can be boolean (on/off), percentage-based (active for X% of users), or targeted (active for specific user IDs, roles, or attributes). Complex flags combine multiple conditions: active for 10% of premium users in the US region.

# Feature flag implementation
class FeatureFlags
  def initialize(user)
    @user = user
    @flags = FlagStore.new
  end
  
  def enabled?(feature)
    flag = @flags.get(feature)
    
    return false unless flag.active?
    
    case flag.rollout_type
    when :boolean
      flag.value
    when :percentage
      user_hash % 100 < flag.percentage
    when :targeted
      flag.user_ids.include?(@user.id) ||
        flag.roles.include?(@user.role)
    end
  end
  
  private
  
  def user_hash
    Digest::MD5.hexdigest(@user.id.to_s).to_i(16)
  end
end

# Usage in application code
def checkout_process
  if feature_flags.enabled?(:new_payment_flow)
    render :new_checkout
  else
    render :legacy_checkout
  end
end

Feature flags enable gradual rollouts identical to canary deployments but at the application level rather than infrastructure level. Deploy code with flags disabled, enable for 5% of users, monitor metrics, increase to 25%, and so on. This provides fine-grained control without complex infrastructure routing.

Flag technical debt accumulates when old flags remain in code after full rollout. Teams must remove flags once features are fully enabled, treating flags as temporary constructs. Long-lived flags complicate code, increase test surface area, and create confusion about system behavior.

Ruby Implementation

Capistrano Deployment Automation

Capistrano provides Ruby-based deployment automation, defining deployment workflows as Ruby code. It connects to servers via SSH, executes commands, manages releases, and handles rollback. Capistrano suits traditional server deployments where applications run on VMs or bare metal.

# Capistrano deployment configuration (config/deploy.rb)
lock '~> 3.18.0'

set :application, 'my_app'
set :repo_url, 'git@github.com:username/my_app.git'
set :deploy_to, '/var/www/my_app'
set :keep_releases, 5

namespace :deploy do
  desc 'Restart application'
  task :restart do
    on roles(:app) do
      execute :touch, release_path.join('tmp/restart.txt')
    end
  end
  
  desc 'Run database migrations'
  task :migrate do
    on primary(:db) do
      within release_path do
        with rails_env: fetch(:rails_env) do
          execute :rake, 'db:migrate'
        end
      end
    end
  end
  
  after :publishing, :restart
  before :restart, :migrate
end

# Rolling deployment implementation
namespace :rolling do
  task :deploy do
    on roles(:app), in: :sequence, wait: 30 do
      invoke 'deploy:updating'
      invoke 'deploy:updated'
      invoke 'deploy:publishing'
      invoke 'deploy:published'
      invoke 'deploy:restart'
      
      # Wait and verify before next server
      sleep 60
      validate_instance(host.hostname) || 
        raise('Health check failed')
    end
  end
end

def validate_instance(host)
  uri = URI("https://#{host}/health")
  response = Net::HTTP.get_response(uri)
  response.is_a?(Net::HTTPSuccess)
rescue
  false
end

Capistrano organizes deployments into releases stored in separate directories. The current release symlinks to the active version. Rollback changes the symlink to the previous release. This structure enables fast rollback without redeploying code.

Health Check Implementation

Applications must provide health check endpoints for deployment orchestration and load balancer integration. Health checks verify database connectivity, cache availability, and critical service dependencies.

# Comprehensive health check
class HealthCheck
  def self.status
    checks = {
      database: database_check,
      cache: cache_check,
      storage: storage_check,
      job_queue: queue_check
    }
    
    healthy = checks.values.all? { |check| check[:healthy] }
    
    {
      status: healthy ? 'healthy' : 'unhealthy',
      timestamp: Time.now.iso8601,
      version: ENV['APP_VERSION'],
      checks: checks
    }
  end
  
  private
  
  def self.database_check
    start = Time.now
    ActiveRecord::Base.connection.execute('SELECT 1')
    {
      healthy: true,
      response_time: Time.now - start
    }
  rescue => e
    {
      healthy: false,
      error: e.message
    }
  end
  
  def self.cache_check
    start = Time.now
    Rails.cache.write('health_check', Time.now.to_i)
    value = Rails.cache.read('health_check')
    {
      healthy: value.is_a?(Integer),
      response_time: Time.now - start
    }
  rescue => e
    {
      healthy: false,
      error: e.message
    }
  end
end

Deployment Hooks and Callbacks

Ruby applications integrate deployment logic through hooks that execute at specific deployment phases. These hooks handle tasks like asset compilation, cache warming, and service notifications.

# Rails deployment hooks (config/deploy.rb)
namespace :deploy do
  after :updated, :compile_assets do
    on roles(:app) do
      within release_path do
        execute :rake, 'assets:precompile'
      end
    end
  end
  
  after :publishing, :warm_cache do
    on roles(:app) do
      execute :curl, '-s', "http://localhost/cache/warm"
    end
  end
  
  after :restart, :notify_deployment do
    on roles(:app) do
      execute :curl, '-X POST', ENV['SLACK_WEBHOOK'],
        '-d', %Q{{"text": "Deployed #{fetch(:current_revision)} to production"}}
    end
  end
  
  after :rollback, :notify_rollback do
    on roles(:app) do
      execute :curl, '-X POST', ENV['PAGERDUTY_WEBHOOK'],
        '-d', %Q{{"incident_key": "deployment", "event_type": "trigger"}}
    end
  end
end

Container Deployment with Ruby

Containerized Ruby applications deploy through orchestration platforms like Kubernetes. The deployment manifest defines rolling update parameters, health checks, and resource requirements.

# Dockerfile for Ruby application
FROM ruby:3.2-alpine

WORKDIR /app

COPY Gemfile Gemfile.lock ./
RUN bundle install --without development test

COPY . .

RUN RAILS_ENV=production bundle exec rake assets:precompile

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:3000/health || exit 1

CMD ["bundle", "exec", "puma", "-C", "config/puma.rb"]

The Kubernetes deployment manifest configures rolling update strategy and health checks:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ruby-app
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
  template:
    spec:
      containers:
      - name: app
        image: myapp:v2.0
        readinessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10

Design Considerations

Selecting a Deployment Strategy

Application characteristics determine appropriate deployment strategies. Availability requirements, traffic patterns, infrastructure costs, and team capabilities all influence selection.

Applications requiring 99.99% uptime cannot tolerate recreate deployments. Rolling or blue-green deployments maintain availability during updates. Applications with flexible availability requirements or maintenance windows can use simpler strategies.

Traffic volume affects strategy selection. Low-traffic applications tolerate brief outages or reduced capacity during rolling deployments. High-traffic applications need strategies that maintain full capacity or support rapid rollback when problems occur.

Infrastructure costs scale with deployment complexity. Blue-green deployments double infrastructure requirements. Rolling deployments temporarily reduce capacity. Recreate deployments use minimal resources. Organizations balance availability requirements against infrastructure costs.

Team operational capabilities constrain deployment strategies. Blue-green and canary deployments require automated orchestration, monitoring, and rollback procedures. Small teams may lack resources to build and maintain complex deployment infrastructure.

Trade-offs Between Strategies

Deployment strategies trade simplicity, speed, safety, and cost. Recreate deployment offers maximum simplicity but provides no safety measures and causes downtime. Blue-green deployment provides maximum safety and instant rollback but requires double infrastructure.

Rolling deployment balances many concerns: maintains availability, limits risk exposure through incremental updates, uses minimal extra infrastructure, and supports rollback by redeploying previous versions. The gradual rollout increases deployment duration compared to strategies that switch all traffic simultaneously.

Canary deployment provides maximum safety through gradual traffic shifting and automated monitoring but requires sophisticated traffic routing and metric collection. Teams must build automation to compare version metrics and control traffic percentages.

Feature flags offer maximum flexibility, enabling deployment and feature release decoupling, but accumulate technical debt when flags remain in code indefinitely. Applications with many feature flags become harder to test because each flag combination creates a different code path.

Database Migration Strategies

Database schema changes complicate deployments because schema updates affect all application versions. Several approaches handle schema changes during deployments.

Backward-compatible migrations deploy in multiple phases. Phase one adds new columns without removing old columns. Phase two deploys application code using new columns while maintaining old column compatibility. Phase three removes old columns after confirming all instances use new columns. This approach supports all deployment strategies but extends deployment timelines.

# Phase 1: Add new column
class AddEmailVerifiedToUsers < ActiveRecord::Migration[7.0]
  def change
    add_column :users, :email_verified_at, :datetime
    add_index :users, :email_verified_at
  end
end

# Old code still works during this phase
# Phase 2: Update code to use new column
class User < ApplicationRecord
  def email_verified?
    email_verified_at.present?
  end
end

# Phase 3: Remove old implementation after full deployment
class RemoveEmailVerifiedFromUsers < ActiveRecord::Migration[7.0]
  def change
    remove_column :users, :email_verified
  end
end

Database deployment coordination runs migrations before or after application deployment depending on compatibility. Adding columns runs before deployment so new code finds expected schema. Removing columns runs after deployment so old code does not reference missing columns.

Blue-green with separate databases maintains separate databases for blue and green environments. This eliminates schema compatibility concerns but requires data replication or shared read replicas. The approach suits applications where environments can temporarily diverge.

Rollback Strategies

Effective rollback procedures restore service when deployments introduce defects. Rollback speed and reliability determine blast radius when problems occur.

Blue-green deployments provide instant rollback by redirecting traffic to the previous environment. Rolling deployments rollback by redeploying the previous version, which takes longer but requires no spare infrastructure. Canary deployments rollback by reducing canary traffic to zero.

Database rollbacks complicate application rollbacks. Rolling back application code without rolling back schema changes causes errors when old code expects old schema. Teams must consider database state when executing rollbacks, potentially requiring schema rollback migrations.

# Automated rollback trigger
class DeploymentMonitor
  def monitor(deployment_id, duration: 600)
    start_time = Time.now
    
    while Time.now - start_time < duration
      metrics = fetch_metrics(deployment_id)
      
      if metrics[:error_rate] > threshold[:error_rate]
        trigger_rollback(deployment_id, reason: 'error_rate')
        return false
      end
      
      if metrics[:p95_latency] > threshold[:p95_latency]
        trigger_rollback(deployment_id, reason: 'latency')
        return false
      end
      
      sleep 30
    end
    
    true
  end
  
  def trigger_rollback(deployment_id, reason:)
    deployment = Deployment.find(deployment_id)
    deployment.rollback!
    
    notify_team(
      deployment: deployment,
      reason: reason,
      metrics: fetch_metrics(deployment_id)
    )
  end
end

Tools & Ecosystem

Deployment Automation Tools

Capistrano automates Ruby application deployments to traditional servers. It connects via SSH, executes deployment commands, manages release directories, and handles rollback. Capistrano suits applications deployed to VMs or bare metal servers.

Ansible provides general-purpose automation including deployment workflows. It uses YAML playbooks to define deployment steps and supports idempotent operations. Ansible handles infrastructure provisioning, configuration management, and application deployment.

Terraform manages infrastructure as code but integrates with deployment workflows. Teams use Terraform to provision infrastructure then trigger application deployments through other tools. The combination enables complete environment reproduction.

Container Orchestration

Kubernetes orchestrates containerized applications with built-in rolling deployment support. Deployment manifests define desired state, and Kubernetes automatically handles instance updates, health checks, and rollback when health checks fail.

Docker Swarm provides simpler container orchestration than Kubernetes with rolling update support. Swarm suits smaller deployments requiring less complexity than Kubernetes provides.

Amazon ECS offers managed container orchestration on AWS with rolling deployment and blue-green deployment support through integration with Application Load Balancer.

Traffic Management

HAProxy provides high-performance load balancing with traffic routing rules for canary deployments. Configuration defines backend server pools and routing percentages.

NGINX offers load balancing and traffic routing through configuration or dynamic reconfiguration via API. NGINX Plus adds commercial features including advanced health checks and dynamic reconfiguration.

Service meshes (Istio, Linkerd) add traffic management to Kubernetes through sidecar proxies. They enable sophisticated traffic splitting, canary deployments, and A/B testing without application changes.

Feature Flag Platforms

LaunchDarkly provides commercial feature flag management with targeting rules, percentage rollouts, and metric integration. It offers SDKs for multiple languages including Ruby.

Flipper offers open-source feature flag management for Ruby applications. It stores flags in Redis, ActiveRecord, or other backends and supports boolean, percentage, and actor-based flags.

# Flipper usage
require 'flipper'

Flipper.configure do |config|
  config.adapter = Flipper::Adapters::ActiveRecord.new
end

# Enable feature for percentage
Flipper.enable_percentage_of_actors(:new_ui, 25)

# Enable for specific users
Flipper.enable_actor(:premium_feature, current_user)

# Check in application
if Flipper.enabled?(:new_ui, current_user)
  render :new_ui
else
  render :legacy_ui
end

Monitoring and Observability

Prometheus collects metrics from applications and infrastructure with alert rules triggering on metric thresholds. Deployment monitoring queries Prometheus for error rates and latencies.

Datadog provides commercial monitoring with deployment tracking, anomaly detection, and alert notification. It correlates deployment events with metric changes to identify deployment-related problems.

New Relic offers application performance monitoring with deployment markers. Teams compare metrics before and after deployments to detect performance regressions.

Real-World Applications

High-Traffic Web Application Deployment

A web application serving 10,000 requests per second requires deployment strategies that maintain capacity and detect problems quickly. The application uses blue-green deployment with automated validation and monitoring.

The deployment process provisions a green environment matching blue capacity. Load tests confirm green handles expected traffic. Automated smoke tests verify critical functionality. The load balancer switches 5% of traffic to green for 10 minutes while monitoring error rates and latencies. Traffic increases to 25%, 50%, then 100% if metrics remain healthy.

# Production deployment orchestration
class ProductionDeployment
  def initialize(version)
    @version = version
    @blue_env = Environment.new('blue')
    @green_env = Environment.new('green')
    @lb = LoadBalancer.new
  end
  
  def deploy
    prepare_green_environment
    run_validation_suite || abort_deployment
    execute_gradual_rollout
  end
  
  private
  
  def prepare_green_environment
    @green_env.deploy(@version)
    @green_env.scale_to(instances: 50)
    @green_env.warm_caches
    wait_for_readiness(@green_env)
  end
  
  def run_validation_suite
    load_test_results = LoadTester.run(
      target: @green_env,
      duration: 300,
      rps: 1000
    )
    
    smoke_test_results = SmokeTests.run(@green_env)
    
    load_test_results.success? && smoke_test_results.success?
  end
  
  def execute_gradual_rollout
    [5, 25, 50, 100].each do |percentage|
      @lb.route_traffic(@green_env, percentage: percentage)
      
      monitor_period = percentage == 100 ? 600 : 300
      unless monitor_metrics(duration: monitor_period)
        @lb.route_traffic(@blue_env, percentage: 100)
        raise DeploymentFailed
      end
    end
  end
  
  def monitor_metrics(duration:)
    MetricMonitor.compare(
      baseline: @blue_env,
      canary: @green_env,
      duration: duration,
      thresholds: {
        error_rate: 0.01,
        p95_latency: 500,
        p99_latency: 1000
      }
    )
  end
end

Database migrations use the expand-contract pattern. The first deployment adds new columns and dual-writes to old and new columns. After confirming new column usage, a subsequent deployment removes old columns. This maintains compatibility during the transition.

Microservice Rolling Deployment

A microservices architecture with 20 services requires coordinated deployments that maintain service contracts. Rolling deployment updates one service at a time while others continue running.

Service deployments must maintain API compatibility because dependent services may not update simultaneously. Versioned APIs enable old and new versions to coexist. Services accept requests in old and new formats, responding in the requested format.

# Versioned API controller
class Api::V2::UsersController < ApiController
  def show
    user = User.find(params[:id])
    
    render json: V2::UserSerializer.new(user).as_json
  end
end

# Backward-compatible serializer
module V2
  class UserSerializer
    def initialize(user)
      @user = user
    end
    
    def as_json
      {
        id: @user.id,
        email: @user.email,
        profile: {
          name: @user.name,
          # New field in v2
          verified_at: @user.email_verified_at
        }
      }
    end
  end
end

Service mesh configuration controls traffic routing during deployments. The deployment updates one service instance, verifies health, then proceeds to the next instance. The mesh ensures requests route only to healthy instances.

Feature Flag Rollout

A major UI redesign deploys behind a feature flag with gradual rollout based on user cohorts. The initial deployment enables the new UI for internal employees only. After validation, rollout expands to 5% of free users and all premium users. Finally, the flag enables for all users.

# Cohort-based feature flag
class NewUiFlag
  def self.enabled?(user)
    return true if employee?(user)
    return true if premium_user?(user)
    return percentage_rollout?(user, percentage: 5) if free_user?(user)
    false
  end
  
  private
  
  def self.employee?(user)
    user.email.end_with?('@company.com')
  end
  
  def self.premium_user?(user)
    user.subscription_tier == 'premium'
  end
  
  def self.free_user?(user)
    user.subscription_tier == 'free'
  end
  
  def self.percentage_rollout?(user, percentage:)
    user_hash = Digest::MD5.hexdigest(user.id.to_s).to_i(16)
    user_hash % 100 < percentage
  end
end

# Usage in controller
def dashboard
  if NewUiFlag.enabled?(current_user)
    render :new_dashboard
  else
    render :legacy_dashboard
  end
end

Metrics track conversion rates, error rates, and user engagement for both UI versions. A/B testing infrastructure compares cohorts to measure the impact of UI changes on business metrics. If new UI conversion rate drops significantly, the flag disables until issues are resolved.

Database Migration Deployment

A critical database schema change requires careful coordination between application and database updates. The deployment uses a three-phase approach to maintain zero downtime.

Phase one deploys application code that writes to both old and new columns while reading from old columns. This deployment uses rolling strategy, updating instances incrementally. Database migration adds new columns but does not remove old columns.

# Phase 1: Dual-write application code
class User < ApplicationRecord
  before_save :sync_email_verified
  
  def email_verified?
    # Read from old column during transition
    read_attribute(:email_verified)
  end
  
  def email_verified=(value)
    # Write to both columns
    write_attribute(:email_verified, value)
    write_attribute(:email_verified_at, value ? Time.now : nil)
  end
  
  private
  
  def sync_email_verified
    if email_verified_at_changed?
      self.email_verified = email_verified_at.present?
    end
  end
end

Phase two deploys code reading from new columns. Background job backfills new columns for existing rows. Once backfill completes and all instances run new code, phase three removes old columns.

This approach maintains database compatibility throughout deployment. Old code works with old columns. New code works with new columns. The transition period supports both until migration completes.

Reference

Strategy Comparison

Strategy	Downtime	Rollback Speed	Infrastructure Cost	Complexity	Risk Level
Recreate	Minutes	Slow (redeploy)	Low (1x)	Low	High
Rolling	None	Medium (redeploy)	Low (1.1-1.2x)	Medium	Medium
Blue-Green	None	Instant (traffic switch)	High (2x)	Medium	Low
Canary	None	Fast (reduce traffic)	Medium (1.2-1.5x)	High	Very Low
Feature Flag	None	Instant (disable flag)	Low (1x)	High	Very Low

Decision Matrix

Requirement	Recommended Strategy
Zero downtime required	Rolling, Blue-Green, Canary, Feature Flag
Instant rollback needed	Blue-Green, Feature Flag
Cost optimization priority	Rolling, Recreate, Feature Flag
Maximum safety required	Canary, Feature Flag
Simple infrastructure	Recreate, Rolling
Gradual user exposure	Canary, Feature Flag
Scheduled maintenance window	Recreate
Database schema changes	Rolling with multi-phase migrations
High traffic volume	Blue-Green, Canary
Microservices architecture	Rolling, Canary

Deployment Checklist

Phase	Task	Validation
Pre-Deployment	Run test suite	All tests pass
Pre-Deployment	Review schema changes	Backward compatible
Pre-Deployment	Check dependency updates	No breaking changes
Pre-Deployment	Verify rollback procedure	Documented and tested
Deployment	Deploy to staging	Smoke tests pass
Deployment	Run load tests	Performance acceptable
Deployment	Update production	Health checks pass
Deployment	Monitor error rates	Below threshold
Post-Deployment	Verify critical paths	Business functions work
Post-Deployment	Check metric dashboards	No anomalies detected
Post-Deployment	Monitor for 1 hour	Metrics stable
Post-Deployment	Document issues	Incident log updated

Health Check Response Format

Field	Type	Description
status	string	healthy or unhealthy
timestamp	ISO8601	Check execution time
version	string	Application version identifier
checks	object	Individual check results
checks.database	object	Database connectivity status
checks.cache	object	Cache system status
checks.storage	object	File storage status
checks.response_time	number	Check duration in milliseconds

Monitoring Metrics

Metric	Description	Alert Threshold
Error Rate	Percentage of failed requests	Above 1%
P95 Latency	95th percentile response time	Above 500ms
P99 Latency	99th percentile response time	Above 1000ms
Throughput	Requests per second	Below 80% of baseline
CPU Usage	Percentage of CPU utilized	Above 80%
Memory Usage	Percentage of memory utilized	Above 90%
Database Connections	Active database connections	Above 80% of pool
Queue Depth	Pending background jobs	Above 1000

Common Deployment Commands

Task	Capistrano Command
Deploy current branch	cap production deploy
Rollback to previous release	cap production deploy:rollback
Check deployment status	cap production deploy:check
Run database migrations	cap production deploy:migrate
Restart application	cap production deploy:restart
View deployed releases	cap production releases
Clean old releases	cap production deploy:cleanup
Deploy specific branch	cap production deploy BRANCH=feature-x

Deployment Strategies