CrackedRuby - Container Orchestration

Overview

Container orchestration automates the operational tasks of running containerized applications across multiple machines. When applications run in containers, they need mechanisms to start, stop, scale, network, and monitor containers across a cluster of hosts. Orchestration platforms handle these operational concerns, allowing applications to scale from single containers to thousands of instances across hundreds of machines.

The need for orchestration emerged from container adoption patterns. Running a single container on one machine is straightforward, but production applications require running many containers across multiple hosts with coordination between them. Manual management becomes infeasible at scale. Orchestration platforms solve this by treating a cluster of machines as a single deployment target.

Container orchestration platforms manage several fundamental aspects of containerized applications. They handle scheduling decisions about which machines run which containers based on resource requirements and constraints. They monitor container health and restart failed containers automatically. They scale applications up or down based on demand or defined rules. They manage networking to enable containers to communicate across different hosts. They handle service discovery so containers can find and connect to each other. They manage configuration and secrets distribution to containers.

The orchestration layer abstracts away the underlying infrastructure. Applications declare desired state - how many instances should run, what resources they need, how they should network together. The orchestrator continuously works to maintain that desired state regardless of failures or changes in the underlying infrastructure.

# Conceptual representation of desired state
desired_state = {
  service: "web-api",
  replicas: 3,
  image: "myapp:v1.2.3",
  resources: {
    cpu: "500m",
    memory: "512Mi"
  },
  ports: [8080],
  health_check: "/health"
}

# Orchestrator maintains this state
current_state = orchestrator.get_state("web-api")
if current_state[:replicas] < desired_state[:replicas]
  orchestrator.scale_up(desired_state[:replicas] - current_state[:replicas])
end

Key Principles

Container orchestration operates on several foundational principles that define how these systems function.

Declarative Configuration: Orchestration platforms use declarative configuration rather than imperative commands. Instead of specifying the steps to deploy an application, users declare the desired end state. The orchestrator determines the necessary actions to reach that state. This approach makes configurations reproducible and simplifies version control. Changes involve updating the desired state declaration, not executing a sequence of commands.

Desired State Reconciliation: Orchestrators continuously monitor actual system state and compare it to desired state. When discrepancies exist, the orchestrator takes corrective action. If a container crashes, the orchestrator starts a replacement. If a node fails, containers are rescheduled elsewhere. This self-healing behavior creates resilient systems without manual intervention.

Scheduling and Placement: The scheduler decides which physical or virtual machines run which containers. Scheduling algorithms consider multiple factors: resource requirements (CPU, memory, disk), hardware constraints (GPU availability, storage types), affinity rules (co-locate related containers), anti-affinity rules (spread containers across failure domains), and resource availability across the cluster. The scheduler balances workload distribution while respecting constraints.

Service Discovery and Load Balancing: Containers are ephemeral - they start, stop, and move between hosts. Applications need mechanisms to discover and connect to other services despite this fluidity. Orchestrators provide service discovery through DNS or environment variables. Internal load balancers distribute traffic across container instances. As containers scale up or down, the load balancer pool updates automatically.

Resource Management: Orchestrators allocate cluster resources to containers based on requests and limits. Resource requests guarantee minimum resources. Resource limits cap maximum consumption. The scheduler uses requests for placement decisions. The runtime enforces limits to prevent containers from consuming excessive resources. This prevents resource contention and ensures fair sharing.

Health Monitoring: Orchestrators continuously check container health through probes. Liveness probes determine if a container is running. Readiness probes determine if a container can accept traffic. Startup probes give containers time to initialize. Failed health checks trigger automatic remediation - restarting containers or removing them from load balancer pools.

Rolling Updates and Rollbacks: Orchestrators support zero-downtime deployments through rolling updates. New versions gradually replace old versions. The orchestrator starts new containers, waits for health checks to pass, then terminates old containers. If problems occur, rollbacks restore the previous version. This enables continuous deployment with minimal risk.

Storage Orchestration: Containers are stateless by default, but applications often need persistent storage. Orchestrators manage storage volumes, attaching them to containers as needed. Volumes persist beyond container lifecycle. The orchestrator handles mounting volumes on the correct hosts and managing volume lifecycles.

Implementation Approaches

Container orchestration can be implemented through several architectural strategies, each suited to different operational requirements and organizational constraints.

Cluster-Based Orchestration: This approach treats multiple machines as a unified cluster. A control plane manages cluster state and scheduling decisions. Worker nodes run containerized workloads. The control plane remains separate from workload execution, preventing resource contention. Kubernetes exemplifies this architecture with its control plane components (API server, scheduler, controller manager) separate from worker nodes running kubelet and container runtime. This separation enables high availability through control plane replication. The cluster abstraction allows treating hundreds of machines as one logical deployment target.

Implementation involves deploying control plane components on dedicated infrastructure for reliability. Worker nodes join the cluster through authentication. The control plane maintains cluster state in a distributed data store. Workload scheduling happens through API interactions - users submit desired state to the API server, controllers watch for changes and trigger actions, the scheduler assigns containers to nodes. This architecture scales to thousands of nodes but requires managing the control plane itself.

Serverless Container Orchestration: Cloud providers offer managed services that orchestrate containers without exposing the underlying cluster. AWS Fargate, Google Cloud Run, and Azure Container Instances exemplify this approach. Users submit container configurations and the platform handles all orchestration concerns. No node management, patching, or capacity planning is required. The platform automatically scales capacity based on workload.

This approach reduces operational overhead significantly. The provider manages control planes, worker nodes, networking, and upgrades. Users focus solely on application containers. Scaling happens automatically without pre-provisioning capacity. Billing is per-container rather than per-instance. Trade-offs include less control over infrastructure, potential vendor lock-in, and sometimes higher costs at scale compared to self-managed clusters.

Swarm-Mode Orchestration: Docker Swarm provides orchestration integrated into the Docker engine. Nodes run standard Docker with swarm mode enabled. One or more nodes act as managers maintaining cluster state. Worker nodes run containers. Swarm uses a simpler architecture than Kubernetes with fewer moving parts. Service definitions specify desired state. The swarm manager schedules tasks across nodes. Built-in load balancing routes requests to healthy containers.

This approach works well for simpler deployments or teams already using Docker. Setup involves initializing swarm mode and adding nodes. Services are deployed through Docker commands or compose files. Swarm handles scheduling, scaling, and health checks. The architecture is less feature-rich than Kubernetes but significantly simpler to operate. Fewer components means less complexity but also less flexibility for advanced use cases.

Hybrid and Multi-Cluster Orchestration: Large organizations often run multiple orchestration clusters across regions, cloud providers, or environments. Federation approaches coordinate across these clusters. Workloads can span multiple clusters for geographic distribution, disaster recovery, or cloud migration. A federation layer provides unified APIs across clusters while each cluster maintains autonomy.

Implementation requires additional abstraction layers. Tools like Kubernetes Federation or service mesh solutions enable multi-cluster coordination. Challenges include state synchronization across clusters, network connectivity between clusters, and consistent configuration management. This approach suits organizations with compliance requirements for data locality, needs for disaster recovery across regions, or strategies to avoid cloud provider lock-in.

Ruby Implementation

Ruby applications interact with container orchestrators primarily through client libraries and SDKs. While orchestrators themselves are typically written in Go, Ruby provides several options for managing containerized applications and interacting with orchestration platforms.

Kubernetes Ruby Client: The kubeclient gem provides a Ruby interface to the Kubernetes API. It handles authentication, API versioning, and resource manipulation.

require 'kubeclient'

# Connect to Kubernetes cluster
config = Kubeclient::Config.read('/path/to/kubeconfig')
client = Kubeclient::Client.new(
  config.context.api_endpoint,
  'v1',
  ssl_options: config.context.ssl_options,
  auth_options: config.context.auth_options
)

# List pods in a namespace
pods = client.get_pods(namespace: 'production')
pods.each do |pod|
  puts "#{pod.metadata.name}: #{pod.status.phase}"
end

# Create a deployment
deployment = Kubeclient::Resource.new({
  metadata: {
    name: 'ruby-app',
    namespace: 'production'
  },
  spec: {
    replicas: 3,
    selector: {
      matchLabels: { app: 'ruby-app' }
    },
    template: {
      metadata: {
        labels: { app: 'ruby-app' }
      },
      spec: {
        containers: [{
          name: 'web',
          image: 'myregistry/ruby-app:v1.2.3',
          ports: [{ containerPort: 3000 }],
          env: [
            { name: 'RAILS_ENV', value: 'production' },
            { name: 'DATABASE_URL', valueFrom: {
              secretKeyRef: { name: 'db-credentials', key: 'url' }
            }}
          ],
          resources: {
            requests: { cpu: '100m', memory: '256Mi' },
            limits: { cpu: '500m', memory: '512Mi' }
          },
          livenessProbe: {
            httpGet: { path: '/health', port: 3000 },
            initialDelaySeconds: 30,
            periodSeconds: 10
          }
        }]
      }
    }
  }
})

apps_client = Kubeclient::Client.new(
  config.context.api_endpoint + '/apis/apps',
  'v1',
  ssl_options: config.context.ssl_options,
  auth_options: config.context.auth_options
)
apps_client.create_deployment(deployment)

Managing Application Lifecycle: Ruby applications can orchestrate their own updates and scaling through the Kubernetes API. This enables custom deployment strategies or application-aware scaling.

class KubernetesDeployer
  def initialize(namespace:)
    @namespace = namespace
    @client = build_kubernetes_client
  end

  def rolling_update(deployment_name, new_image)
    deployment = @client.get_deployment(deployment_name, @namespace)
    
    # Update image
    deployment.spec.template.spec.containers.first.image = new_image
    
    # Configure rolling update strategy
    deployment.spec.strategy = {
      type: 'RollingUpdate',
      rollingUpdate: {
        maxSurge: 1,
        maxUnavailable: 0
      }
    }
    
    # Apply update
    @client.update_deployment(deployment)
    
    # Monitor rollout
    wait_for_rollout(deployment_name)
  end

  def scale(deployment_name, replicas)
    deployment = @client.get_deployment(deployment_name, @namespace)
    deployment.spec.replicas = replicas
    @client.update_deployment(deployment)
  end

  def wait_for_rollout(deployment_name, timeout: 300)
    deadline = Time.now + timeout
    
    loop do
      deployment = @client.get_deployment(deployment_name, @namespace)
      
      desired = deployment.spec.replicas
      updated = deployment.status.updatedReplicas || 0
      available = deployment.status.availableReplicas || 0
      
      if updated == desired && available == desired
        return true
      end
      
      raise "Rollout timeout" if Time.now > deadline
      sleep 5
    end
  end

  private

  def build_kubernetes_client
    config = Kubeclient::Config.read(ENV['KUBECONFIG'])
    Kubeclient::Client.new(
      config.context.api_endpoint + '/apis/apps',
      'v1',
      ssl_options: config.context.ssl_options,
      auth_options: config.context.auth_options
    )
  end
end

# Usage
deployer = KubernetesDeployer.new(namespace: 'production')
deployer.rolling_update('web-api', 'myregistry/web-api:v2.0.0')
deployer.scale('worker', 10)

Docker API Integration: The docker-api gem provides Ruby bindings for the Docker Engine API, suitable for Docker Swarm or standalone container management.

require 'docker'

# Connect to Docker daemon
Docker.url = 'tcp://swarm-manager:2376'
Docker.options = {
  client_cert: '/path/to/cert.pem',
  client_key: '/path/to/key.pem'
}

# Create a service in Docker Swarm
service = Docker::Service.create(
  'Name' => 'ruby-worker',
  'TaskTemplate' => {
    'ContainerSpec' => {
      'Image' => 'myregistry/worker:latest',
      'Env' => [
        'REDIS_URL=redis://redis:6379',
        'QUEUE=critical,default'
      ],
      'Mounts' => [{
        'Type' => 'volume',
        'Source' => 'worker-data',
        'Target' => '/data'
      }]
    },
    'Resources' => {
      'Limits' => { 'NanoCPUs' => 500_000_000, 'MemoryBytes' => 536_870_912 },
      'Reservations' => { 'NanoCPUs' => 100_000_000, 'MemoryBytes' => 268_435_456 }
    },
    'RestartPolicy' => {
      'Condition' => 'on-failure',
      'MaxAttempts' => 3
    }
  },
  'Mode' => { 'Replicated' => { 'Replicas' => 5 } },
  'UpdateConfig' => {
    'Parallelism' => 1,
    'Delay' => 10_000_000_000
  }
)

# Scale service
service.scale(10)

# Get service tasks
tasks = service.tasks
tasks.each do |task|
  puts "Task #{task.id}: #{task.info['Status']['State']}"
end

Custom Controllers: Ruby can implement Kubernetes controllers that watch for resource changes and take actions. This enables custom automation logic.

require 'kubeclient'
require 'logger'

class CustomController
  def initialize(namespace:)
    @namespace = namespace
    @client = build_kubernetes_client
    @logger = Logger.new(STDOUT)
  end

  def watch_deployments
    watcher = @client.watch_deployments(namespace: @namespace)
    
    watcher.each do |notice|
      case notice.type
      when 'ADDED', 'MODIFIED'
        handle_deployment_change(notice.object)
      when 'DELETED'
        handle_deployment_deletion(notice.object)
      end
    end
  rescue => e
    @logger.error("Watch error: #{e.message}")
    sleep 5
    retry
  end

  private

  def handle_deployment_change(deployment)
    name = deployment.metadata.name
    replicas = deployment.spec.replicas
    ready = deployment.status.readyReplicas || 0
    
    @logger.info("Deployment #{name}: #{ready}/#{replicas} replicas ready")
    
    # Custom logic: auto-scale based on custom metrics
    if should_scale_up?(deployment)
      scale_deployment(deployment, replicas + 1)
    elsif should_scale_down?(deployment)
      scale_deployment(deployment, [replicas - 1, 1].max)
    end
  end

  def should_scale_up?(deployment)
    # Implement custom scaling logic
    # Could check external metrics, queue depths, custom resources
    false
  end

  def should_scale_down?(deployment)
    false
  end

  def scale_deployment(deployment, new_replicas)
    deployment.spec.replicas = new_replicas
    @client.update_deployment(deployment)
    @logger.info("Scaled #{deployment.metadata.name} to #{new_replicas}")
  end
end

Helm Integration: Helm charts package Kubernetes applications. Ruby can interact with Helm through shell commands or by parsing Helm chart structures.

class HelmDeployer
  def initialize(namespace:)
    @namespace = namespace
  end

  def install_chart(release_name, chart, values = {})
    values_file = write_values_file(values)
    
    cmd = [
      'helm', 'install', release_name, chart,
      '--namespace', @namespace,
      '--values', values_file,
      '--wait',
      '--timeout', '5m'
    ].join(' ')
    
    output = `#{cmd}`
    raise "Helm install failed: #{output}" unless $?.success?
    
    output
  ensure
    File.delete(values_file) if values_file && File.exist?(values_file)
  end

  def upgrade_chart(release_name, chart, values = {})
    values_file = write_values_file(values)
    
    cmd = [
      'helm', 'upgrade', release_name, chart,
      '--namespace', @namespace,
      '--values', values_file,
      '--wait',
      '--timeout', '5m',
      '--atomic'
    ].join(' ')
    
    output = `#{cmd}`
    raise "Helm upgrade failed: #{output}" unless $?.success?
    
    output
  ensure
    File.delete(values_file) if values_file && File.exist?(values_file)
  end

  private

  def write_values_file(values)
    file = Tempfile.new(['values', '.yaml'])
    file.write(values.to_yaml)
    file.close
    file.path
  end
end

# Usage
deployer = HelmDeployer.new(namespace: 'production')
deployer.install_chart('my-app', 'charts/ruby-app', {
  'image' => {
    'repository' => 'myregistry/app',
    'tag' => 'v1.0.0'
  },
  'replicaCount' => 3,
  'resources' => {
    'requests' => { 'cpu' => '100m', 'memory' => '256Mi' }
  }
})

Tools & Ecosystem

Container orchestration relies on an ecosystem of tools that handle different aspects of the orchestration lifecycle.

Kubernetes: The dominant orchestration platform. Kubernetes provides comprehensive orchestration features including scheduling, scaling, service discovery, configuration management, and storage orchestration. The platform runs on various infrastructures from on-premises data centers to public clouds. Major cloud providers offer managed Kubernetes services (GKE, EKS, AKS) that handle control plane management.

Kubernetes architecture separates control plane from worker nodes. The control plane includes the API server (handles all API requests), etcd (distributed key-value store for cluster state), scheduler (assigns pods to nodes), and controller manager (runs controllers that maintain desired state). Worker nodes run kubelet (manages pod lifecycle), kube-proxy (handles networking), and a container runtime (containerd or CRI-O).

The API-driven design makes Kubernetes extensible. Custom Resource Definitions (CRDs) extend the API with custom resources. Operators use custom controllers to manage complex applications. The large ecosystem includes tools for networking (Calico, Cilium), service mesh (Istio, Linkerd), ingress (Nginx, Traefik), storage (Rook, Longhorn), and monitoring (Prometheus, Grafana).

Docker Swarm: Integrated into Docker Engine, Swarm provides simpler orchestration for Docker containers. Swarm uses the same Docker Compose file format for service definitions. The architecture includes manager nodes (maintain cluster state, schedule services) and worker nodes (run containers). Built-in features include overlay networking, service discovery, and rolling updates.

Swarm suits smaller deployments or teams preferring Docker-native tooling. Setup requires fewer components than Kubernetes. Service definition syntax is familiar to Docker users. Trade-offs include a smaller ecosystem and fewer advanced features compared to Kubernetes. Swarm remains viable for straightforward orchestration needs without Kubernetes complexity.

Amazon ECS: AWS Elastic Container Service provides AWS-native container orchestration. ECS integrates deeply with AWS services like IAM, CloudWatch, and Application Load Balancers. Two launch types exist: EC2 (containers run on EC2 instances you manage) and Fargate (serverless container execution). Task definitions specify container configurations. Services maintain desired task counts and handle load balancing.

ECS suits AWS-centric architectures. The service handles scheduling, placement, and scaling. Integration with AWS services simplifies authentication, logging, and monitoring. Task definitions use JSON format. The ECS CLI and CloudFormation provide infrastructure-as-code options. AWS manages the control plane, reducing operational overhead.

HashiCorp Nomad: A simpler alternative to Kubernetes, Nomad orchestrates containers and other workload types (VMs, standalone executables). Nomad's architecture includes servers (maintain state, schedule) and clients (run workloads). Job specifications declare desired state. Nomad handles scheduling, service discovery through Consul, and secrets management through Vault.

Nomad works well for heterogeneous workloads beyond containers. The learning curve is gentler than Kubernetes. Single binary deployment simplifies operations. Integration with Consul and Vault provides service mesh and secrets management. Trade-offs include a smaller ecosystem and community compared to Kubernetes.

Container Runtimes: Orchestrators rely on container runtimes to execute containers. containerd, the industry-standard runtime, implements the Container Runtime Interface (CRI). CRI-O provides a lightweight CRI implementation specifically for Kubernetes. Both runtimes support OCI (Open Container Initiative) images and runtime specifications. The runtime handles image pulling, container creation, networking setup, and resource isolation.

Service Mesh: Service mesh tools manage service-to-service communication in orchestrated environments. Istio provides traffic management, security, and observability through sidecar proxies injected into each pod. Linkerd offers similar capabilities with lower resource overhead. Consul Connect integrates with Nomad for service mesh functionality. Service meshes handle load balancing, circuit breaking, mutual TLS, and distributed tracing without application code changes.

GitOps Tools: GitOps applies Git workflow to infrastructure and application deployment. Flux and ArgoCD continuously synchronize Git repositories with Kubernetes clusters. Configurations in Git represent desired state. The GitOps operator detects drift and applies changes automatically. This approach provides audit trails, rollback capability, and declarative infrastructure management.

CI/CD Integration: Container orchestration integrates with continuous integration and deployment pipelines. Jenkins X provides Kubernetes-native CI/CD. Tekton offers cloud-native pipeline building blocks. Spinnaker handles multi-cloud continuous delivery. These tools automate building container images, running tests, and deploying to orchestration platforms. Integration with orchestrators enables automated deployments, canary releases, and blue-green deployments.

Real-World Applications

Container orchestration enables patterns that shape how modern applications deploy and operate in production.

Multi-Tier Application Deployment: Production applications typically consist of multiple tiers: web servers, application servers, background workers, caching layers, and databases. Orchestration platforms deploy these tiers as separate services with dependencies and networking between them.

# Example: Rails application with multiple components
class ProductionDeployment
  def initialize(namespace)
    @namespace = namespace
    @client = build_kubernetes_client
  end

  def deploy_full_stack
    # Deploy PostgreSQL StatefulSet
    deploy_database
    
    # Deploy Redis for caching and job queue
    deploy_redis
    
    # Deploy Rails web application
    deploy_web_app
    
    # Deploy Sidekiq workers
    deploy_workers
    
    # Configure ingress for external traffic
    deploy_ingress
  end

  private

  def deploy_database
    statefulset = {
      metadata: { name: 'postgres', namespace: @namespace },
      spec: {
        serviceName: 'postgres',
        replicas: 1,
        selector: { matchLabels: { app: 'postgres' } },
        template: {
          metadata: { labels: { app: 'postgres' } },
          spec: {
            containers: [{
              name: 'postgres',
              image: 'postgres:14',
              env: [
                { name: 'POSTGRES_DB', value: 'production' },
                { name: 'POSTGRES_USER', valueFrom: { 
                  secretKeyRef: { name: 'db-creds', key: 'username' }
                }},
                { name: 'POSTGRES_PASSWORD', valueFrom: {
                  secretKeyRef: { name: 'db-creds', key: 'password' }
                }}
              ],
              ports: [{ containerPort: 5432 }],
              volumeMounts: [{
                name: 'data',
                mountPath: '/var/lib/postgresql/data'
              }]
            }]
          }
        },
        volumeClaimTemplates: [{
          metadata: { name: 'data' },
          spec: {
            accessModes: ['ReadWriteOnce'],
            resources: { requests: { storage: '100Gi' } }
          }
        }]
      }
    }
    @client.create_statefulset(Kubeclient::Resource.new(statefulset))
    
    # Create service
    service = {
      metadata: { name: 'postgres', namespace: @namespace },
      spec: {
        selector: { app: 'postgres' },
        ports: [{ port: 5432 }],
        clusterIP: 'None'
      }
    }
    @client.create_service(Kubeclient::Resource.new(service))
  end

  def deploy_web_app
    deployment = {
      metadata: { name: 'web', namespace: @namespace },
      spec: {
        replicas: 5,
        selector: { matchLabels: { app: 'web' } },
        template: {
          metadata: { labels: { app: 'web' } },
          spec: {
            containers: [{
              name: 'rails',
              image: 'myregistry/rails-app:latest',
              command: ['bundle', 'exec', 'puma', '-C', 'config/puma.rb'],
              env: [
                { name: 'RAILS_ENV', value: 'production' },
                { name: 'DATABASE_URL', value: 'postgresql://postgres:5432/production' },
                { name: 'REDIS_URL', value: 'redis://redis:6379/0' },
                { name: 'SECRET_KEY_BASE', valueFrom: {
                  secretKeyRef: { name: 'rails-secrets', key: 'secret_key_base' }
                }}
              ],
              ports: [{ containerPort: 3000 }],
              resources: {
                requests: { cpu: '200m', memory: '512Mi' },
                limits: { cpu: '1000m', memory: '1Gi' }
              },
              livenessProbe: {
                httpGet: { path: '/health', port: 3000 },
                initialDelaySeconds: 30,
                periodSeconds: 10
              },
              readinessProbe: {
                httpGet: { path: '/ready', port: 3000 },
                initialDelaySeconds: 10,
                periodSeconds: 5
              }
            }]
          }
        }
      }
    }
    @apps_client.create_deployment(Kubeclient::Resource.new(deployment))
  end
end

Autoscaling Patterns: Production systems scale automatically based on metrics. Horizontal Pod Autoscaling (HPA) adjusts replica counts based on CPU, memory, or custom metrics. Cluster Autoscaling adds or removes nodes based on resource demands.

class AutoscalingManager
  def configure_hpa(deployment_name, min_replicas:, max_replicas:, target_cpu:)
    hpa = {
      metadata: { name: deployment_name, namespace: @namespace },
      spec: {
        scaleTargetRef: {
          apiVersion: 'apps/v1',
          kind: 'Deployment',
          name: deployment_name
        },
        minReplicas: min_replicas,
        maxReplicas: max_replicas,
        metrics: [{
          type: 'Resource',
          resource: {
            name: 'cpu',
            target: {
              type: 'Utilization',
              averageUtilization: target_cpu
            }
          }
        }]
      }
    }
    
    @autoscaling_client.create_horizontal_pod_autoscaler(
      Kubeclient::Resource.new(hpa)
    )
  end

  def configure_custom_metric_scaling(deployment_name, metric:, target:)
    # Scale based on custom metrics (queue depth, request rate, etc.)
    hpa = {
      metadata: { name: "#{deployment_name}-custom", namespace: @namespace },
      spec: {
        scaleTargetRef: {
          apiVersion: 'apps/v1',
          kind: 'Deployment',
          name: deployment_name
        },
        minReplicas: 2,
        maxReplicas: 50,
        metrics: [{
          type: 'Pods',
          pods: {
            metric: { name: metric },
            target: {
              type: 'AverageValue',
              averageValue: target
            }
          }
        }]
      }
    }
    
    @autoscaling_client.create_horizontal_pod_autoscaler(
      Kubeclient::Resource.new(hpa)
    )
  end
end

Blue-Green Deployments: This pattern maintains two identical production environments. Traffic routes to one environment (blue) while the other (green) remains idle. New versions deploy to the idle environment. After validation, traffic switches to the new version. If issues occur, traffic switches back instantly.

Orchestrators facilitate blue-green deployments through service selectors. Services route to pods based on labels. Changing the service selector switches traffic between environments. Both environments run simultaneously during the switch, requiring double resources temporarily.

Canary Releases: Canary deployments gradually shift traffic from old to new versions. A small percentage of traffic routes to the new version initially. Monitoring verifies the new version behaves correctly. Traffic percentage increases gradually until all traffic uses the new version. Problems trigger automatic rollback to the previous version.

Service mesh tools like Istio provide fine-grained traffic splitting. Applications can implement percentage-based routing or route specific user segments to canary versions for testing.

Job Scheduling: Orchestrators handle batch jobs and cron-like scheduled tasks. Jobs run containers to completion rather than as long-running services. CronJobs execute on defined schedules. This pattern suits data processing, report generation, database backups, and maintenance tasks.

# Kubernetes Job for one-time data migration
migration_job = {
  metadata: { name: 'db-migration-v2', namespace: 'production' },
  spec: {
    template: {
      spec: {
        containers: [{
          name: 'migrate',
          image: 'myapp:v2.0.0',
          command: ['bundle', 'exec', 'rake', 'db:migrate'],
          env: [
            { name: 'DATABASE_URL', valueFrom: {
              secretKeyRef: { name: 'db-creds', key: 'url' }
            }}
          ]
        }],
        restartPolicy: 'OnFailure'
      }
    },
    backoffLimit: 3
  }
}

# CronJob for nightly reports
report_cronjob = {
  metadata: { name: 'nightly-report', namespace: 'production' },
  spec: {
    schedule: '0 2 * * *',
    jobTemplate: {
      spec: {
        template: {
          spec: {
            containers: [{
              name: 'reporter',
              image: 'myapp:latest',
              command: ['bundle', 'exec', 'rake', 'reports:generate'],
              env: [
                { name: 'REPORT_DATE', value: '$(date -d yesterday +%Y-%m-%d)' }
              ]
            }],
            restartPolicy: 'OnFailure'
          }
        }
      }
    }
  }
}

Disaster Recovery: Orchestration platforms enable disaster recovery through cluster federation or multi-region deployments. Applications replicate across geographic regions. If one region fails, traffic shifts to healthy regions. Database replication keeps data synchronized across regions. DNS or global load balancers route traffic to healthy endpoints.

State management becomes critical in disaster recovery scenarios. Stateless services recover easily by starting new containers. Stateful services require data replication strategies. Object storage replication, database streaming replication, and volume snapshots provide options for state recovery.

Zero-Downtime Maintenance: Orchestrators enable cluster upgrades and node maintenance without downtime. Node draining moves pods to other nodes before maintenance. PodDisruptionBudgets ensure minimum replica counts remain available during disruptions. Rolling node updates upgrade one node at a time while workloads remain available on other nodes.

Reference

Orchestration Platform Comparison

Platform	Architecture	Use Case	Complexity	Ecosystem
Kubernetes	Distributed control plane, multiple worker nodes	Large-scale, complex deployments	High	Extensive
Docker Swarm	Manager nodes, worker nodes	Simple deployments, Docker-native	Low	Limited
Amazon ECS	AWS-managed control plane	AWS-centric applications	Medium	AWS services
Nomad	Server/client architecture	Multi-workload orchestration	Medium	HashiCorp stack
Fargate	Serverless, no cluster management	Serverless containers	Low	AWS services

Kubernetes Resource Types

Resource	Purpose	Scope	Lifecycle
Pod	One or more containers	Namespaced	Ephemeral
Deployment	Manages pod replicas	Namespaced	Persistent
StatefulSet	Stateful applications	Namespaced	Persistent
DaemonSet	One pod per node	Namespaced	Persistent
Job	Run to completion	Namespaced	Ephemeral
CronJob	Scheduled jobs	Namespaced	Persistent
Service	Network access to pods	Namespaced	Persistent
Ingress	HTTP/HTTPS routing	Namespaced	Persistent
ConfigMap	Configuration data	Namespaced	Persistent
Secret	Sensitive data	Namespaced	Persistent
PersistentVolume	Storage resource	Cluster	Persistent
PersistentVolumeClaim	Storage request	Namespaced	Persistent
Namespace	Resource isolation	Cluster	Persistent

Scheduling Constraints

Constraint Type	Description	Use Case
Resource requests	Minimum guaranteed resources	Scheduling decisions
Resource limits	Maximum allowed resources	Runtime enforcement
Node selector	Schedule on specific nodes	Hardware requirements
Node affinity	Prefer or require nodes	Flexible placement
Pod affinity	Co-locate related pods	Performance, data locality
Pod anti-affinity	Spread pods across nodes	High availability
Taints and tolerations	Prevent or allow pod placement	Dedicated nodes

Health Check Types

Probe Type	Purpose	Action on Failure	Timing
Liveness	Container is running	Restart container	Throughout lifetime
Readiness	Container can serve traffic	Remove from load balancer	Throughout lifetime
Startup	Container has started	Restart container	Initial startup only

Update Strategies

Strategy	Behavior	Downtime	Use Case
RollingUpdate	Gradual replacement	None	Standard deployments
Recreate	Delete all, create new	Yes	Database schema changes
Blue-Green	Two full environments	None	Zero-risk rollback
Canary	Gradual traffic shift	None	Risk mitigation

Service Types

Type	Accessibility	Load Balancing	Use Case
ClusterIP	Internal cluster only	Yes	Internal services
NodePort	External via node ports	Yes	Development, testing
LoadBalancer	External via cloud LB	Yes	Production external access
ExternalName	DNS CNAME	No	External service proxy

Volume Types

Volume Type	Lifecycle	Use Case	Persistence
emptyDir	Pod lifetime	Temporary storage	No
hostPath	Node filesystem	Node-specific data	Yes
persistentVolumeClaim	Independent	Databases, files	Yes
configMap	Independent	Configuration	Yes
secret	Independent	Credentials	Yes
nfs	External	Shared storage	Yes

Ruby Orchestration Gems

Gem	Purpose	Compatibility
kubeclient	Kubernetes API client	Kubernetes 1.10+
docker-api	Docker Engine API	Docker 1.6+
kubernetes-deploy	Kubernetes deployment tool	Kubernetes 1.14+
helm-rb	Helm chart operations	Helm 3.x

Common kubectl Commands

Command	Purpose	Example
get	List resources	kubectl get pods
describe	Detailed resource info	kubectl describe pod nginx
logs	Container logs	kubectl logs -f pod-name
exec	Execute in container	kubectl exec -it pod-name -- bash
apply	Create/update from file	kubectl apply -f deployment.yaml
delete	Remove resources	kubectl delete deployment nginx
scale	Change replica count	kubectl scale deployment nginx --replicas=5
rollout	Manage rollouts	kubectl rollout status deployment/nginx
port-forward	Forward local port	kubectl port-forward pod-name 8080:80

Deployment Checklist

Task	Consideration
Resource sizing	CPU and memory requests/limits set appropriately
Health checks	Liveness and readiness probes configured
Scaling	Horizontal autoscaling configured for variable load
Updates	Rolling update strategy and parameters defined
Networking	Service type and port configuration correct
Storage	Persistent volumes configured for stateful components
Configuration	ConfigMaps and Secrets created and referenced
Security	RBAC permissions, pod security policies applied
Monitoring	Logging and metrics collection configured
Backup	Disaster recovery procedures documented

Container Orchestration