CrackedRuby CrackedRuby

Overview

Container orchestration automates the operational tasks of running containerized applications across multiple machines. When applications run in containers, they need mechanisms to start, stop, scale, network, and monitor containers across a cluster of hosts. Orchestration platforms handle these operational concerns, allowing applications to scale from single containers to thousands of instances across hundreds of machines.

The need for orchestration emerged from container adoption patterns. Running a single container on one machine is straightforward, but production applications require running many containers across multiple hosts with coordination between them. Manual management becomes infeasible at scale. Orchestration platforms solve this by treating a cluster of machines as a single deployment target.

Container orchestration platforms manage several fundamental aspects of containerized applications. They handle scheduling decisions about which machines run which containers based on resource requirements and constraints. They monitor container health and restart failed containers automatically. They scale applications up or down based on demand or defined rules. They manage networking to enable containers to communicate across different hosts. They handle service discovery so containers can find and connect to each other. They manage configuration and secrets distribution to containers.

The orchestration layer abstracts away the underlying infrastructure. Applications declare desired state - how many instances should run, what resources they need, how they should network together. The orchestrator continuously works to maintain that desired state regardless of failures or changes in the underlying infrastructure.

# Conceptual representation of desired state
desired_state = {
  service: "web-api",
  replicas: 3,
  image: "myapp:v1.2.3",
  resources: {
    cpu: "500m",
    memory: "512Mi"
  },
  ports: [8080],
  health_check: "/health"
}

# Orchestrator maintains this state
current_state = orchestrator.get_state("web-api")
if current_state[:replicas] < desired_state[:replicas]
  orchestrator.scale_up(desired_state[:replicas] - current_state[:replicas])
end

Key Principles

Container orchestration operates on several foundational principles that define how these systems function.

Declarative Configuration: Orchestration platforms use declarative configuration rather than imperative commands. Instead of specifying the steps to deploy an application, users declare the desired end state. The orchestrator determines the necessary actions to reach that state. This approach makes configurations reproducible and simplifies version control. Changes involve updating the desired state declaration, not executing a sequence of commands.

Desired State Reconciliation: Orchestrators continuously monitor actual system state and compare it to desired state. When discrepancies exist, the orchestrator takes corrective action. If a container crashes, the orchestrator starts a replacement. If a node fails, containers are rescheduled elsewhere. This self-healing behavior creates resilient systems without manual intervention.

Scheduling and Placement: The scheduler decides which physical or virtual machines run which containers. Scheduling algorithms consider multiple factors: resource requirements (CPU, memory, disk), hardware constraints (GPU availability, storage types), affinity rules (co-locate related containers), anti-affinity rules (spread containers across failure domains), and resource availability across the cluster. The scheduler balances workload distribution while respecting constraints.

Service Discovery and Load Balancing: Containers are ephemeral - they start, stop, and move between hosts. Applications need mechanisms to discover and connect to other services despite this fluidity. Orchestrators provide service discovery through DNS or environment variables. Internal load balancers distribute traffic across container instances. As containers scale up or down, the load balancer pool updates automatically.

Resource Management: Orchestrators allocate cluster resources to containers based on requests and limits. Resource requests guarantee minimum resources. Resource limits cap maximum consumption. The scheduler uses requests for placement decisions. The runtime enforces limits to prevent containers from consuming excessive resources. This prevents resource contention and ensures fair sharing.

Health Monitoring: Orchestrators continuously check container health through probes. Liveness probes determine if a container is running. Readiness probes determine if a container can accept traffic. Startup probes give containers time to initialize. Failed health checks trigger automatic remediation - restarting containers or removing them from load balancer pools.

Rolling Updates and Rollbacks: Orchestrators support zero-downtime deployments through rolling updates. New versions gradually replace old versions. The orchestrator starts new containers, waits for health checks to pass, then terminates old containers. If problems occur, rollbacks restore the previous version. This enables continuous deployment with minimal risk.

Storage Orchestration: Containers are stateless by default, but applications often need persistent storage. Orchestrators manage storage volumes, attaching them to containers as needed. Volumes persist beyond container lifecycle. The orchestrator handles mounting volumes on the correct hosts and managing volume lifecycles.

Implementation Approaches

Container orchestration can be implemented through several architectural strategies, each suited to different operational requirements and organizational constraints.

Cluster-Based Orchestration: This approach treats multiple machines as a unified cluster. A control plane manages cluster state and scheduling decisions. Worker nodes run containerized workloads. The control plane remains separate from workload execution, preventing resource contention. Kubernetes exemplifies this architecture with its control plane components (API server, scheduler, controller manager) separate from worker nodes running kubelet and container runtime. This separation enables high availability through control plane replication. The cluster abstraction allows treating hundreds of machines as one logical deployment target.

Implementation involves deploying control plane components on dedicated infrastructure for reliability. Worker nodes join the cluster through authentication. The control plane maintains cluster state in a distributed data store. Workload scheduling happens through API interactions - users submit desired state to the API server, controllers watch for changes and trigger actions, the scheduler assigns containers to nodes. This architecture scales to thousands of nodes but requires managing the control plane itself.

Serverless Container Orchestration: Cloud providers offer managed services that orchestrate containers without exposing the underlying cluster. AWS Fargate, Google Cloud Run, and Azure Container Instances exemplify this approach. Users submit container configurations and the platform handles all orchestration concerns. No node management, patching, or capacity planning is required. The platform automatically scales capacity based on workload.

This approach reduces operational overhead significantly. The provider manages control planes, worker nodes, networking, and upgrades. Users focus solely on application containers. Scaling happens automatically without pre-provisioning capacity. Billing is per-container rather than per-instance. Trade-offs include less control over infrastructure, potential vendor lock-in, and sometimes higher costs at scale compared to self-managed clusters.

Swarm-Mode Orchestration: Docker Swarm provides orchestration integrated into the Docker engine. Nodes run standard Docker with swarm mode enabled. One or more nodes act as managers maintaining cluster state. Worker nodes run containers. Swarm uses a simpler architecture than Kubernetes with fewer moving parts. Service definitions specify desired state. The swarm manager schedules tasks across nodes. Built-in load balancing routes requests to healthy containers.

This approach works well for simpler deployments or teams already using Docker. Setup involves initializing swarm mode and adding nodes. Services are deployed through Docker commands or compose files. Swarm handles scheduling, scaling, and health checks. The architecture is less feature-rich than Kubernetes but significantly simpler to operate. Fewer components means less complexity but also less flexibility for advanced use cases.

Hybrid and Multi-Cluster Orchestration: Large organizations often run multiple orchestration clusters across regions, cloud providers, or environments. Federation approaches coordinate across these clusters. Workloads can span multiple clusters for geographic distribution, disaster recovery, or cloud migration. A federation layer provides unified APIs across clusters while each cluster maintains autonomy.

Implementation requires additional abstraction layers. Tools like Kubernetes Federation or service mesh solutions enable multi-cluster coordination. Challenges include state synchronization across clusters, network connectivity between clusters, and consistent configuration management. This approach suits organizations with compliance requirements for data locality, needs for disaster recovery across regions, or strategies to avoid cloud provider lock-in.

Ruby Implementation

Ruby applications interact with container orchestrators primarily through client libraries and SDKs. While orchestrators themselves are typically written in Go, Ruby provides several options for managing containerized applications and interacting with orchestration platforms.

Kubernetes Ruby Client: The kubeclient gem provides a Ruby interface to the Kubernetes API. It handles authentication, API versioning, and resource manipulation.

require 'kubeclient'

# Connect to Kubernetes cluster
config = Kubeclient::Config.read('/path/to/kubeconfig')
client = Kubeclient::Client.new(
  config.context.api_endpoint,
  'v1',
  ssl_options: config.context.ssl_options,
  auth_options: config.context.auth_options
)

# List pods in a namespace
pods = client.get_pods(namespace: 'production')
pods.each do |pod|
  puts "#{pod.metadata.name}: #{pod.status.phase}"
end

# Create a deployment
deployment = Kubeclient::Resource.new({
  metadata: {
    name: 'ruby-app',
    namespace: 'production'
  },
  spec: {
    replicas: 3,
    selector: {
      matchLabels: { app: 'ruby-app' }
    },
    template: {
      metadata: {
        labels: { app: 'ruby-app' }
      },
      spec: {
        containers: [{
          name: 'web',
          image: 'myregistry/ruby-app:v1.2.3',
          ports: [{ containerPort: 3000 }],
          env: [
            { name: 'RAILS_ENV', value: 'production' },
            { name: 'DATABASE_URL', valueFrom: {
              secretKeyRef: { name: 'db-credentials', key: 'url' }
            }}
          ],
          resources: {
            requests: { cpu: '100m', memory: '256Mi' },
            limits: { cpu: '500m', memory: '512Mi' }
          },
          livenessProbe: {
            httpGet: { path: '/health', port: 3000 },
            initialDelaySeconds: 30,
            periodSeconds: 10
          }
        }]
      }
    }
  }
})

apps_client = Kubeclient::Client.new(
  config.context.api_endpoint + '/apis/apps',
  'v1',
  ssl_options: config.context.ssl_options,
  auth_options: config.context.auth_options
)
apps_client.create_deployment(deployment)

Managing Application Lifecycle: Ruby applications can orchestrate their own updates and scaling through the Kubernetes API. This enables custom deployment strategies or application-aware scaling.

class KubernetesDeployer
  def initialize(namespace:)
    @namespace = namespace
    @client = build_kubernetes_client
  end

  def rolling_update(deployment_name, new_image)
    deployment = @client.get_deployment(deployment_name, @namespace)
    
    # Update image
    deployment.spec.template.spec.containers.first.image = new_image
    
    # Configure rolling update strategy
    deployment.spec.strategy = {
      type: 'RollingUpdate',
      rollingUpdate: {
        maxSurge: 1,
        maxUnavailable: 0
      }
    }
    
    # Apply update
    @client.update_deployment(deployment)
    
    # Monitor rollout
    wait_for_rollout(deployment_name)
  end

  def scale(deployment_name, replicas)
    deployment = @client.get_deployment(deployment_name, @namespace)
    deployment.spec.replicas = replicas
    @client.update_deployment(deployment)
  end

  def wait_for_rollout(deployment_name, timeout: 300)
    deadline = Time.now + timeout
    
    loop do
      deployment = @client.get_deployment(deployment_name, @namespace)
      
      desired = deployment.spec.replicas
      updated = deployment.status.updatedReplicas || 0
      available = deployment.status.availableReplicas || 0
      
      if updated == desired && available == desired
        return true
      end
      
      raise "Rollout timeout" if Time.now > deadline
      sleep 5
    end
  end

  private

  def build_kubernetes_client
    config = Kubeclient::Config.read(ENV['KUBECONFIG'])
    Kubeclient::Client.new(
      config.context.api_endpoint + '/apis/apps',
      'v1',
      ssl_options: config.context.ssl_options,
      auth_options: config.context.auth_options
    )
  end
end

# Usage
deployer = KubernetesDeployer.new(namespace: 'production')
deployer.rolling_update('web-api', 'myregistry/web-api:v2.0.0')
deployer.scale('worker', 10)

Docker API Integration: The docker-api gem provides Ruby bindings for the Docker Engine API, suitable for Docker Swarm or standalone container management.

require 'docker'

# Connect to Docker daemon
Docker.url = 'tcp://swarm-manager:2376'
Docker.options = {
  client_cert: '/path/to/cert.pem',
  client_key: '/path/to/key.pem'
}

# Create a service in Docker Swarm
service = Docker::Service.create(
  'Name' => 'ruby-worker',
  'TaskTemplate' => {
    'ContainerSpec' => {
      'Image' => 'myregistry/worker:latest',
      'Env' => [
        'REDIS_URL=redis://redis:6379',
        'QUEUE=critical,default'
      ],
      'Mounts' => [{
        'Type' => 'volume',
        'Source' => 'worker-data',
        'Target' => '/data'
      }]
    },
    'Resources' => {
      'Limits' => { 'NanoCPUs' => 500_000_000, 'MemoryBytes' => 536_870_912 },
      'Reservations' => { 'NanoCPUs' => 100_000_000, 'MemoryBytes' => 268_435_456 }
    },
    'RestartPolicy' => {
      'Condition' => 'on-failure',
      'MaxAttempts' => 3
    }
  },
  'Mode' => { 'Replicated' => { 'Replicas' => 5 } },
  'UpdateConfig' => {
    'Parallelism' => 1,
    'Delay' => 10_000_000_000
  }
)

# Scale service
service.scale(10)

# Get service tasks
tasks = service.tasks
tasks.each do |task|
  puts "Task #{task.id}: #{task.info['Status']['State']}"
end

Custom Controllers: Ruby can implement Kubernetes controllers that watch for resource changes and take actions. This enables custom automation logic.

require 'kubeclient'
require 'logger'

class CustomController
  def initialize(namespace:)
    @namespace = namespace
    @client = build_kubernetes_client
    @logger = Logger.new(STDOUT)
  end

  def watch_deployments
    watcher = @client.watch_deployments(namespace: @namespace)
    
    watcher.each do |notice|
      case notice.type
      when 'ADDED', 'MODIFIED'
        handle_deployment_change(notice.object)
      when 'DELETED'
        handle_deployment_deletion(notice.object)
      end
    end
  rescue => e
    @logger.error("Watch error: #{e.message}")
    sleep 5
    retry
  end

  private

  def handle_deployment_change(deployment)
    name = deployment.metadata.name
    replicas = deployment.spec.replicas
    ready = deployment.status.readyReplicas || 0
    
    @logger.info("Deployment #{name}: #{ready}/#{replicas} replicas ready")
    
    # Custom logic: auto-scale based on custom metrics
    if should_scale_up?(deployment)
      scale_deployment(deployment, replicas + 1)
    elsif should_scale_down?(deployment)
      scale_deployment(deployment, [replicas - 1, 1].max)
    end
  end

  def should_scale_up?(deployment)
    # Implement custom scaling logic
    # Could check external metrics, queue depths, custom resources
    false
  end

  def should_scale_down?(deployment)
    false
  end

  def scale_deployment(deployment, new_replicas)
    deployment.spec.replicas = new_replicas
    @client.update_deployment(deployment)
    @logger.info("Scaled #{deployment.metadata.name} to #{new_replicas}")
  end
end

Helm Integration: Helm charts package Kubernetes applications. Ruby can interact with Helm through shell commands or by parsing Helm chart structures.

class HelmDeployer
  def initialize(namespace:)
    @namespace = namespace
  end

  def install_chart(release_name, chart, values = {})
    values_file = write_values_file(values)
    
    cmd = [
      'helm', 'install', release_name, chart,
      '--namespace', @namespace,
      '--values', values_file,
      '--wait',
      '--timeout', '5m'
    ].join(' ')
    
    output = `#{cmd}`
    raise "Helm install failed: #{output}" unless $?.success?
    
    output
  ensure
    File.delete(values_file) if values_file && File.exist?(values_file)
  end

  def upgrade_chart(release_name, chart, values = {})
    values_file = write_values_file(values)
    
    cmd = [
      'helm', 'upgrade', release_name, chart,
      '--namespace', @namespace,
      '--values', values_file,
      '--wait',
      '--timeout', '5m',
      '--atomic'
    ].join(' ')
    
    output = `#{cmd}`
    raise "Helm upgrade failed: #{output}" unless $?.success?
    
    output
  ensure
    File.delete(values_file) if values_file && File.exist?(values_file)
  end

  private

  def write_values_file(values)
    file = Tempfile.new(['values', '.yaml'])
    file.write(values.to_yaml)
    file.close
    file.path
  end
end

# Usage
deployer = HelmDeployer.new(namespace: 'production')
deployer.install_chart('my-app', 'charts/ruby-app', {
  'image' => {
    'repository' => 'myregistry/app',
    'tag' => 'v1.0.0'
  },
  'replicaCount' => 3,
  'resources' => {
    'requests' => { 'cpu' => '100m', 'memory' => '256Mi' }
  }
})

Tools & Ecosystem

Container orchestration relies on an ecosystem of tools that handle different aspects of the orchestration lifecycle.

Kubernetes: The dominant orchestration platform. Kubernetes provides comprehensive orchestration features including scheduling, scaling, service discovery, configuration management, and storage orchestration. The platform runs on various infrastructures from on-premises data centers to public clouds. Major cloud providers offer managed Kubernetes services (GKE, EKS, AKS) that handle control plane management.

Kubernetes architecture separates control plane from worker nodes. The control plane includes the API server (handles all API requests), etcd (distributed key-value store for cluster state), scheduler (assigns pods to nodes), and controller manager (runs controllers that maintain desired state). Worker nodes run kubelet (manages pod lifecycle), kube-proxy (handles networking), and a container runtime (containerd or CRI-O).

The API-driven design makes Kubernetes extensible. Custom Resource Definitions (CRDs) extend the API with custom resources. Operators use custom controllers to manage complex applications. The large ecosystem includes tools for networking (Calico, Cilium), service mesh (Istio, Linkerd), ingress (Nginx, Traefik), storage (Rook, Longhorn), and monitoring (Prometheus, Grafana).

Docker Swarm: Integrated into Docker Engine, Swarm provides simpler orchestration for Docker containers. Swarm uses the same Docker Compose file format for service definitions. The architecture includes manager nodes (maintain cluster state, schedule services) and worker nodes (run containers). Built-in features include overlay networking, service discovery, and rolling updates.

Swarm suits smaller deployments or teams preferring Docker-native tooling. Setup requires fewer components than Kubernetes. Service definition syntax is familiar to Docker users. Trade-offs include a smaller ecosystem and fewer advanced features compared to Kubernetes. Swarm remains viable for straightforward orchestration needs without Kubernetes complexity.

Amazon ECS: AWS Elastic Container Service provides AWS-native container orchestration. ECS integrates deeply with AWS services like IAM, CloudWatch, and Application Load Balancers. Two launch types exist: EC2 (containers run on EC2 instances you manage) and Fargate (serverless container execution). Task definitions specify container configurations. Services maintain desired task counts and handle load balancing.

ECS suits AWS-centric architectures. The service handles scheduling, placement, and scaling. Integration with AWS services simplifies authentication, logging, and monitoring. Task definitions use JSON format. The ECS CLI and CloudFormation provide infrastructure-as-code options. AWS manages the control plane, reducing operational overhead.

HashiCorp Nomad: A simpler alternative to Kubernetes, Nomad orchestrates containers and other workload types (VMs, standalone executables). Nomad's architecture includes servers (maintain state, schedule) and clients (run workloads). Job specifications declare desired state. Nomad handles scheduling, service discovery through Consul, and secrets management through Vault.

Nomad works well for heterogeneous workloads beyond containers. The learning curve is gentler than Kubernetes. Single binary deployment simplifies operations. Integration with Consul and Vault provides service mesh and secrets management. Trade-offs include a smaller ecosystem and community compared to Kubernetes.

Container Runtimes: Orchestrators rely on container runtimes to execute containers. containerd, the industry-standard runtime, implements the Container Runtime Interface (CRI). CRI-O provides a lightweight CRI implementation specifically for Kubernetes. Both runtimes support OCI (Open Container Initiative) images and runtime specifications. The runtime handles image pulling, container creation, networking setup, and resource isolation.

Service Mesh: Service mesh tools manage service-to-service communication in orchestrated environments. Istio provides traffic management, security, and observability through sidecar proxies injected into each pod. Linkerd offers similar capabilities with lower resource overhead. Consul Connect integrates with Nomad for service mesh functionality. Service meshes handle load balancing, circuit breaking, mutual TLS, and distributed tracing without application code changes.

GitOps Tools: GitOps applies Git workflow to infrastructure and application deployment. Flux and ArgoCD continuously synchronize Git repositories with Kubernetes clusters. Configurations in Git represent desired state. The GitOps operator detects drift and applies changes automatically. This approach provides audit trails, rollback capability, and declarative infrastructure management.

CI/CD Integration: Container orchestration integrates with continuous integration and deployment pipelines. Jenkins X provides Kubernetes-native CI/CD. Tekton offers cloud-native pipeline building blocks. Spinnaker handles multi-cloud continuous delivery. These tools automate building container images, running tests, and deploying to orchestration platforms. Integration with orchestrators enables automated deployments, canary releases, and blue-green deployments.

Real-World Applications

Container orchestration enables patterns that shape how modern applications deploy and operate in production.

Multi-Tier Application Deployment: Production applications typically consist of multiple tiers: web servers, application servers, background workers, caching layers, and databases. Orchestration platforms deploy these tiers as separate services with dependencies and networking between them.

# Example: Rails application with multiple components
class ProductionDeployment
  def initialize(namespace)
    @namespace = namespace
    @client = build_kubernetes_client
  end

  def deploy_full_stack
    # Deploy PostgreSQL StatefulSet
    deploy_database
    
    # Deploy Redis for caching and job queue
    deploy_redis
    
    # Deploy Rails web application
    deploy_web_app
    
    # Deploy Sidekiq workers
    deploy_workers
    
    # Configure ingress for external traffic
    deploy_ingress
  end

  private

  def deploy_database
    statefulset = {
      metadata: { name: 'postgres', namespace: @namespace },
      spec: {
        serviceName: 'postgres',
        replicas: 1,
        selector: { matchLabels: { app: 'postgres' } },
        template: {
          metadata: { labels: { app: 'postgres' } },
          spec: {
            containers: [{
              name: 'postgres',
              image: 'postgres:14',
              env: [
                { name: 'POSTGRES_DB', value: 'production' },
                { name: 'POSTGRES_USER', valueFrom: { 
                  secretKeyRef: { name: 'db-creds', key: 'username' }
                }},
                { name: 'POSTGRES_PASSWORD', valueFrom: {
                  secretKeyRef: { name: 'db-creds', key: 'password' }
                }}
              ],
              ports: [{ containerPort: 5432 }],
              volumeMounts: [{
                name: 'data',
                mountPath: '/var/lib/postgresql/data'
              }]
            }]
          }
        },
        volumeClaimTemplates: [{
          metadata: { name: 'data' },
          spec: {
            accessModes: ['ReadWriteOnce'],
            resources: { requests: { storage: '100Gi' } }
          }
        }]
      }
    }
    @client.create_statefulset(Kubeclient::Resource.new(statefulset))
    
    # Create service
    service = {
      metadata: { name: 'postgres', namespace: @namespace },
      spec: {
        selector: { app: 'postgres' },
        ports: [{ port: 5432 }],
        clusterIP: 'None'
      }
    }
    @client.create_service(Kubeclient::Resource.new(service))
  end

  def deploy_web_app
    deployment = {
      metadata: { name: 'web', namespace: @namespace },
      spec: {
        replicas: 5,
        selector: { matchLabels: { app: 'web' } },
        template: {
          metadata: { labels: { app: 'web' } },
          spec: {
            containers: [{
              name: 'rails',
              image: 'myregistry/rails-app:latest',
              command: ['bundle', 'exec', 'puma', '-C', 'config/puma.rb'],
              env: [
                { name: 'RAILS_ENV', value: 'production' },
                { name: 'DATABASE_URL', value: 'postgresql://postgres:5432/production' },
                { name: 'REDIS_URL', value: 'redis://redis:6379/0' },
                { name: 'SECRET_KEY_BASE', valueFrom: {
                  secretKeyRef: { name: 'rails-secrets', key: 'secret_key_base' }
                }}
              ],
              ports: [{ containerPort: 3000 }],
              resources: {
                requests: { cpu: '200m', memory: '512Mi' },
                limits: { cpu: '1000m', memory: '1Gi' }
              },
              livenessProbe: {
                httpGet: { path: '/health', port: 3000 },
                initialDelaySeconds: 30,
                periodSeconds: 10
              },
              readinessProbe: {
                httpGet: { path: '/ready', port: 3000 },
                initialDelaySeconds: 10,
                periodSeconds: 5
              }
            }]
          }
        }
      }
    }
    @apps_client.create_deployment(Kubeclient::Resource.new(deployment))
  end
end

Autoscaling Patterns: Production systems scale automatically based on metrics. Horizontal Pod Autoscaling (HPA) adjusts replica counts based on CPU, memory, or custom metrics. Cluster Autoscaling adds or removes nodes based on resource demands.

class AutoscalingManager
  def configure_hpa(deployment_name, min_replicas:, max_replicas:, target_cpu:)
    hpa = {
      metadata: { name: deployment_name, namespace: @namespace },
      spec: {
        scaleTargetRef: {
          apiVersion: 'apps/v1',
          kind: 'Deployment',
          name: deployment_name
        },
        minReplicas: min_replicas,
        maxReplicas: max_replicas,
        metrics: [{
          type: 'Resource',
          resource: {
            name: 'cpu',
            target: {
              type: 'Utilization',
              averageUtilization: target_cpu
            }
          }
        }]
      }
    }
    
    @autoscaling_client.create_horizontal_pod_autoscaler(
      Kubeclient::Resource.new(hpa)
    )
  end

  def configure_custom_metric_scaling(deployment_name, metric:, target:)
    # Scale based on custom metrics (queue depth, request rate, etc.)
    hpa = {
      metadata: { name: "#{deployment_name}-custom", namespace: @namespace },
      spec: {
        scaleTargetRef: {
          apiVersion: 'apps/v1',
          kind: 'Deployment',
          name: deployment_name
        },
        minReplicas: 2,
        maxReplicas: 50,
        metrics: [{
          type: 'Pods',
          pods: {
            metric: { name: metric },
            target: {
              type: 'AverageValue',
              averageValue: target
            }
          }
        }]
      }
    }
    
    @autoscaling_client.create_horizontal_pod_autoscaler(
      Kubeclient::Resource.new(hpa)
    )
  end
end

Blue-Green Deployments: This pattern maintains two identical production environments. Traffic routes to one environment (blue) while the other (green) remains idle. New versions deploy to the idle environment. After validation, traffic switches to the new version. If issues occur, traffic switches back instantly.

Orchestrators facilitate blue-green deployments through service selectors. Services route to pods based on labels. Changing the service selector switches traffic between environments. Both environments run simultaneously during the switch, requiring double resources temporarily.

Canary Releases: Canary deployments gradually shift traffic from old to new versions. A small percentage of traffic routes to the new version initially. Monitoring verifies the new version behaves correctly. Traffic percentage increases gradually until all traffic uses the new version. Problems trigger automatic rollback to the previous version.

Service mesh tools like Istio provide fine-grained traffic splitting. Applications can implement percentage-based routing or route specific user segments to canary versions for testing.

Job Scheduling: Orchestrators handle batch jobs and cron-like scheduled tasks. Jobs run containers to completion rather than as long-running services. CronJobs execute on defined schedules. This pattern suits data processing, report generation, database backups, and maintenance tasks.

# Kubernetes Job for one-time data migration
migration_job = {
  metadata: { name: 'db-migration-v2', namespace: 'production' },
  spec: {
    template: {
      spec: {
        containers: [{
          name: 'migrate',
          image: 'myapp:v2.0.0',
          command: ['bundle', 'exec', 'rake', 'db:migrate'],
          env: [
            { name: 'DATABASE_URL', valueFrom: {
              secretKeyRef: { name: 'db-creds', key: 'url' }
            }}
          ]
        }],
        restartPolicy: 'OnFailure'
      }
    },
    backoffLimit: 3
  }
}

# CronJob for nightly reports
report_cronjob = {
  metadata: { name: 'nightly-report', namespace: 'production' },
  spec: {
    schedule: '0 2 * * *',
    jobTemplate: {
      spec: {
        template: {
          spec: {
            containers: [{
              name: 'reporter',
              image: 'myapp:latest',
              command: ['bundle', 'exec', 'rake', 'reports:generate'],
              env: [
                { name: 'REPORT_DATE', value: '$(date -d yesterday +%Y-%m-%d)' }
              ]
            }],
            restartPolicy: 'OnFailure'
          }
        }
      }
    }
  }
}

Disaster Recovery: Orchestration platforms enable disaster recovery through cluster federation or multi-region deployments. Applications replicate across geographic regions. If one region fails, traffic shifts to healthy regions. Database replication keeps data synchronized across regions. DNS or global load balancers route traffic to healthy endpoints.

State management becomes critical in disaster recovery scenarios. Stateless services recover easily by starting new containers. Stateful services require data replication strategies. Object storage replication, database streaming replication, and volume snapshots provide options for state recovery.

Zero-Downtime Maintenance: Orchestrators enable cluster upgrades and node maintenance without downtime. Node draining moves pods to other nodes before maintenance. PodDisruptionBudgets ensure minimum replica counts remain available during disruptions. Rolling node updates upgrade one node at a time while workloads remain available on other nodes.

Reference

Orchestration Platform Comparison

Platform Architecture Use Case Complexity Ecosystem
Kubernetes Distributed control plane, multiple worker nodes Large-scale, complex deployments High Extensive
Docker Swarm Manager nodes, worker nodes Simple deployments, Docker-native Low Limited
Amazon ECS AWS-managed control plane AWS-centric applications Medium AWS services
Nomad Server/client architecture Multi-workload orchestration Medium HashiCorp stack
Fargate Serverless, no cluster management Serverless containers Low AWS services

Kubernetes Resource Types

Resource Purpose Scope Lifecycle
Pod One or more containers Namespaced Ephemeral
Deployment Manages pod replicas Namespaced Persistent
StatefulSet Stateful applications Namespaced Persistent
DaemonSet One pod per node Namespaced Persistent
Job Run to completion Namespaced Ephemeral
CronJob Scheduled jobs Namespaced Persistent
Service Network access to pods Namespaced Persistent
Ingress HTTP/HTTPS routing Namespaced Persistent
ConfigMap Configuration data Namespaced Persistent
Secret Sensitive data Namespaced Persistent
PersistentVolume Storage resource Cluster Persistent
PersistentVolumeClaim Storage request Namespaced Persistent
Namespace Resource isolation Cluster Persistent

Scheduling Constraints

Constraint Type Description Use Case
Resource requests Minimum guaranteed resources Scheduling decisions
Resource limits Maximum allowed resources Runtime enforcement
Node selector Schedule on specific nodes Hardware requirements
Node affinity Prefer or require nodes Flexible placement
Pod affinity Co-locate related pods Performance, data locality
Pod anti-affinity Spread pods across nodes High availability
Taints and tolerations Prevent or allow pod placement Dedicated nodes

Health Check Types

Probe Type Purpose Action on Failure Timing
Liveness Container is running Restart container Throughout lifetime
Readiness Container can serve traffic Remove from load balancer Throughout lifetime
Startup Container has started Restart container Initial startup only

Update Strategies

Strategy Behavior Downtime Use Case
RollingUpdate Gradual replacement None Standard deployments
Recreate Delete all, create new Yes Database schema changes
Blue-Green Two full environments None Zero-risk rollback
Canary Gradual traffic shift None Risk mitigation

Service Types

Type Accessibility Load Balancing Use Case
ClusterIP Internal cluster only Yes Internal services
NodePort External via node ports Yes Development, testing
LoadBalancer External via cloud LB Yes Production external access
ExternalName DNS CNAME No External service proxy

Volume Types

Volume Type Lifecycle Use Case Persistence
emptyDir Pod lifetime Temporary storage No
hostPath Node filesystem Node-specific data Yes
persistentVolumeClaim Independent Databases, files Yes
configMap Independent Configuration Yes
secret Independent Credentials Yes
nfs External Shared storage Yes

Ruby Orchestration Gems

Gem Purpose Compatibility
kubeclient Kubernetes API client Kubernetes 1.10+
docker-api Docker Engine API Docker 1.6+
kubernetes-deploy Kubernetes deployment tool Kubernetes 1.14+
helm-rb Helm chart operations Helm 3.x

Common kubectl Commands

Command Purpose Example
get List resources kubectl get pods
describe Detailed resource info kubectl describe pod nginx
logs Container logs kubectl logs -f pod-name
exec Execute in container kubectl exec -it pod-name -- bash
apply Create/update from file kubectl apply -f deployment.yaml
delete Remove resources kubectl delete deployment nginx
scale Change replica count kubectl scale deployment nginx --replicas=5
rollout Manage rollouts kubectl rollout status deployment/nginx
port-forward Forward local port kubectl port-forward pod-name 8080:80

Deployment Checklist

Task Consideration
Resource sizing CPU and memory requests/limits set appropriately
Health checks Liveness and readiness probes configured
Scaling Horizontal autoscaling configured for variable load
Updates Rolling update strategy and parameters defined
Networking Service type and port configuration correct
Storage Persistent volumes configured for stateful components
Configuration ConfigMaps and Secrets created and referenced
Security RBAC permissions, pod security policies applied
Monitoring Logging and metrics collection configured
Backup Disaster recovery procedures documented