CrackedRuby - Load Balancing

Overview

Load balancing distributes incoming requests across multiple backend servers or resources to prevent any single server from becoming a bottleneck. The technique originated in the 1990s as web traffic scaled beyond single-server capacity, requiring mechanisms to spread load across server farms.

A load balancer acts as a reverse proxy, sitting between clients and backend servers. When a client sends a request, the load balancer applies an algorithm to select which backend server receives the request. The selected server processes the request and returns the response through the load balancer to the client.

Load balancing serves several functions:

Prevents server overload by distributing requests
Increases application availability through redundancy
Enables horizontal scaling by adding servers
Facilitates zero-downtime deployments
Provides failure isolation and automatic failover

Modern applications rarely run on single servers. Load balancing has become fundamental infrastructure, appearing at multiple layers: DNS-level geographic distribution, network-level TCP/UDP balancing, and application-level HTTP routing. Cloud platforms provide managed load balancing services, while on-premises deployments use dedicated hardware or software solutions.

# Conceptual load balancer behavior
class SimpleLoadBalancer
  def initialize(servers)
    @servers = servers
    @current_index = 0
  end
  
  def forward_request(request)
    server = select_server
    server.handle(request)
  end
  
  def select_server
    # Round-robin algorithm
    server = @servers[@current_index]
    @current_index = (@current_index + 1) % @servers.length
    server
  end
end

# Usage
servers = [Server.new('server1'), Server.new('server2'), Server.new('server3')]
lb = SimpleLoadBalancer.new(servers)
lb.forward_request(request)
# => Forwards to server1, then server2, then server3, cycling back

The example shows the basic concept: the load balancer maintains a pool of servers and applies an algorithm to distribute requests. Production implementations handle health checks, connection pooling, SSL termination, and session persistence.

Key Principles

Load balancing operates on several fundamental principles that determine effectiveness and behavior.

Distribution Algorithms determine how requests map to backend servers. The algorithm choice affects load distribution, statefulness requirements, and operational complexity. Common algorithms include round-robin (sequential rotation), least connections (fewest active connections), weighted distribution (proportional to server capacity), and hash-based routing (consistent assignment based on request attributes).

Health Checking monitors backend server availability. Load balancers periodically probe servers using TCP connections, HTTP requests, or custom health endpoints. Failed health checks remove servers from rotation; successful checks restore them. Health check frequency, timeout values, and failure thresholds affect failover speed versus false positive rate.

Session Persistence maintains client-server affinity when application state exists on backend servers. Techniques include source IP hashing, cookie-based routing, or application-level session identifiers. Persistence conflicts with optimal load distribution since it constrains routing decisions.

Connection Management handles network connections between clients and servers. Load balancers may proxy connections (terminating client connections and creating new backend connections) or pass through connections (forwarding packets without termination). Proxy mode enables protocol translation, SSL offloading, and content inspection. Pass-through mode reduces latency and resource usage.

Layer 4 vs Layer 7 distinguishes network-level from application-level load balancing. Layer 4 (transport) operates on TCP/UDP packets, routing based on IP addresses and ports. Layer 7 (application) parses HTTP requests, routing based on URLs, headers, or request content. Layer 7 provides finer control but higher processing overhead.

# Health check implementation
class HealthChecker
  def initialize(server, check_interval: 5, timeout: 2)
    @server = server
    @check_interval = check_interval
    @timeout = timeout
    @healthy = true
  end
  
  def start_monitoring
    Thread.new do
      loop do
        @healthy = perform_health_check
        sleep @check_interval
      end
    end
  end
  
  def healthy?
    @healthy
  end
  
  private
  
  def perform_health_check
    Timeout.timeout(@timeout) do
      response = Net::HTTP.get_response(@server.health_uri)
      response.code == '200'
    end
  rescue Timeout::Error, StandardError
    false
  end
end

Stateless vs Stateful describes whether the load balancer maintains request context. Stateless balancers make independent routing decisions per request, simplifying horizontal scaling. Stateful balancers track connections, sessions, or request sequences, enabling advanced features like transaction integrity but requiring state synchronization across load balancer instances.

Active Health Checks proactively test server availability versus passive health checks that detect failures from failed requests. Active checks provide faster failure detection; passive checks avoid probe traffic overhead. Many systems combine both approaches.

Implementation Approaches

Load balancing implementations vary by architectural layer, deployment model, and operational requirements.

DNS-Based Load Balancing returns multiple IP addresses for a domain name, distributing clients across servers through DNS resolution. The approach requires no dedicated load balancer infrastructure and provides geographic distribution. DNS caching limits failover speed since clients cache resolved addresses. TTL values balance caching efficiency against failover responsiveness.

# DNS-based routing simulation
require 'resolv'

class DNSLoadBalancer
  def initialize(domain, servers)
    @domain = domain
    @servers = servers
    @dns_cache = {}
  end
  
  def resolve(client_ip)
    # Simulate geographic routing
    region = determine_region(client_ip)
    regional_servers = @servers.select { |s| s.region == region }
    
    # Return multiple A records
    regional_servers.sample(3).map(&:ip_address)
  end
  
  def determine_region(ip)
    # Simplified geographic lookup
    ip.start_with?('192.168') ? :us_west : :us_east
  end
end

Hardware Load Balancers use dedicated network appliances with specialized ASICs for high-throughput packet processing. Hardware solutions handle millions of connections per second with microsecond latency. Cost and inflexibility limit adoption to large-scale deployments with predictable capacity requirements.

Software Load Balancers run on commodity servers, providing flexibility and cost efficiency. Popular options include HAProxy, Nginx, and Envoy. Software balancers handle hundreds of thousands of connections on standard hardware. Configuration changes deploy without hardware replacement, enabling rapid iteration.

Cloud Load Balancers provide managed services that abstract infrastructure management. AWS Elastic Load Balancing, Google Cloud Load Balancing, and Azure Load Balancer handle provisioning, scaling, and maintenance. Cloud balancers integrate with auto-scaling, service discovery, and health monitoring. Trade-offs include vendor lock-in and reduced low-level control.

Application-Level Load Balancing embeds distribution logic within application code or frameworks. Client-side load balancing libraries select endpoints directly, eliminating dedicated load balancer infrastructure. Service mesh architectures like Istio provide sidecar proxies that handle routing, retries, and circuit breaking at the application layer.

# Client-side load balancing with retries
class ClientSideLoadBalancer
  def initialize(endpoints)
    @endpoints = endpoints
    @circuit_breakers = Hash.new { |h, k| h[k] = CircuitBreaker.new }
  end
  
  def call(request, max_retries: 3)
    attempts = 0
    
    while attempts < max_retries
      endpoint = select_endpoint
      circuit_breaker = @circuit_breakers[endpoint]
      
      next if circuit_breaker.open?
      
      begin
        response = execute_request(endpoint, request)
        circuit_breaker.record_success
        return response
      rescue RequestError => e
        circuit_breaker.record_failure
        attempts += 1
      end
    end
    
    raise MaxRetriesExceeded
  end
  
  private
  
  def select_endpoint
    available = @endpoints.reject { |e| @circuit_breakers[e].open? }
    available.min_by { |e| @circuit_breakers[e].failure_rate }
  end
  
  def execute_request(endpoint, request)
    # HTTP request implementation
  end
end

Container Orchestration Load Balancing integrates with platforms like Kubernetes. Service definitions create virtual IP addresses that distribute traffic across pod replicas. Ingress controllers route external traffic to services based on hostnames and paths. Container platforms handle service discovery, health checking, and automatic endpoint updates as pods scale.

Design Considerations

Selecting load balancing strategies requires analyzing application characteristics, operational requirements, and infrastructure constraints.

Algorithm Selection depends on workload patterns. Round-robin works well for homogeneous servers with similar request processing times. Least connections suits workloads with highly variable request durations, preventing long-running requests from concentrating on specific servers. IP hash maintains session affinity but distributes poorly when client populations cluster in IP ranges. Random selection provides good distribution with minimal state overhead.

Stateless Applications simplify load balancing by allowing requests to route to any server. Stateless design moves session data to external stores like Redis or databases, enabling load balancers to use any algorithm without session persistence concerns. Stateful applications require sticky sessions or session replication, constraining load distribution and complicating failover.

SSL/TLS Termination determines where encryption processing occurs. Terminating SSL at the load balancer centralizes certificate management and offloads CPU-intensive encryption from backend servers. Backends communicate over unencrypted connections within the trusted network. End-to-end encryption maintains SSL through to backend servers, increasing security but requiring certificate management on all servers and duplicating encryption overhead.

# Configuration modeling for SSL termination
class LoadBalancerConfig
  attr_accessor :ssl_termination, :backend_protocol
  
  def initialize
    @ssl_termination = :load_balancer  # or :backend or :none
    @backend_protocol = :http          # or :https
    @backends = []
  end
  
  def ssl_termination_at_lb?
    @ssl_termination == :load_balancer
  end
  
  def requires_backend_certificates?
    @ssl_termination == :backend || 
    (@ssl_termination == :load_balancer && @backend_protocol == :https)
  end
  
  def connection_pattern
    if ssl_termination_at_lb?
      "HTTPS -> LB -> HTTP -> Backends"
    elsif @ssl_termination == :backend
      "HTTPS -> LB (passthrough) -> HTTPS -> Backends"
    else
      "HTTP -> LB -> HTTP -> Backends"
    end
  end
end

Geographic Distribution routes users to nearby datacenters, reducing latency and improving user experience. DNS-based geographic routing resolves domains to region-specific IP addresses. Active-active multi-region deployments require data replication strategies and conflict resolution. Active-passive setups maintain cold standbys for disaster recovery, accepting longer failover times.

Health Check Design balances failure detection speed against false positive rate. Aggressive health checks (frequent probes, tight timeouts) detect failures quickly but may incorrectly remove servers during transient network issues. Conservative checks (infrequent probes, generous timeouts) reduce false positives but delay failure detection. Application-specific health endpoints verify service functionality beyond basic connectivity.

Scaling Considerations affect architecture decisions. Vertical scaling (larger servers) simplifies deployment but hits hardware limits. Horizontal scaling (more servers) provides unlimited capacity growth through load balancing but requires stateless design. Auto-scaling dynamically adjusts capacity based on metrics, requiring rapid server provisioning and deprovisioning coordination with load balancers.

Cost Trade-offs compare managed services against self-hosted solutions. Cloud load balancers eliminate operational overhead but incur per-GB and per-connection charges. Self-hosted balancers reduce variable costs but require dedicated operations teams. Hybrid approaches use cloud load balancers for edge traffic and self-hosted for internal service mesh.

Ruby Implementation

Ruby applications interact with load balancers primarily as backend servers, though Ruby can implement load balancing logic in certain contexts.

Rack Middleware for Load Balancer Integration handles forwarded headers that preserve original client information when proxied through load balancers.

# Rack middleware for X-Forwarded-For header handling
class LoadBalancerHeaders
  def initialize(app)
    @app = app
  end
  
  def call(env)
    # Trust load balancer headers
    if forwarded_for = env['HTTP_X_FORWARDED_FOR']
      # Get original client IP (first in chain)
      env['REMOTE_ADDR'] = forwarded_for.split(',').first.strip
    end
    
    if forwarded_proto = env['HTTP_X_FORWARDED_PROTO']
      env['rack.url_scheme'] = forwarded_proto
    end
    
    if forwarded_port = env['HTTP_X_FORWARDED_PORT']
      env['SERVER_PORT'] = forwarded_port
    end
    
    @app.call(env)
  end
end

# In config.ru
use LoadBalancerHeaders
run MyRackApp.new

Health Check Endpoints provide application-level health verification for load balancers. Rails applications typically implement health checks as lightweight controller actions.

# Rails health check controller
class HealthController < ApplicationController
  skip_before_action :verify_authenticity_token
  
  def check
    health_status = {
      status: 'healthy',
      database: database_healthy?,
      cache: cache_healthy?,
      dependencies: check_dependencies
    }
    
    if health_status.values.all? { |v| v == true || v[:status] == 'healthy' }
      render json: health_status, status: :ok
    else
      render json: health_status, status: :service_unavailable
    end
  end
  
  def shallow
    # Minimal check for fast health probes
    render plain: 'OK', status: :ok
  end
  
  private
  
  def database_healthy?
    ActiveRecord::Base.connection.active?
  rescue
    false
  end
  
  def cache_healthy?
    Rails.cache.read('health_check')
    true
  rescue
    false
  end
  
  def check_dependencies
    {
      redis: check_redis,
      external_api: check_external_api
    }
  end
  
  def check_redis
    Redis.current.ping == 'PONG'
  rescue
    false
  end
  
  def check_external_api
    # Quick timeout to avoid blocking health checks
    Timeout.timeout(1) do
      response = Net::HTTP.get_response(URI(ENV['EXTERNAL_API_URL']))
      response.code == '200'
    end
  rescue
    false
  end
end

Graceful Shutdown ensures in-flight requests complete before server termination during deployments or scaling events.

# Puma graceful shutdown configuration
# config/puma.rb
workers 4
threads 5, 5

# Handle SIGTERM from load balancer
on_worker_shutdown do
  # Stop accepting new connections
  puts 'Worker shutting down, finishing requests...'
end

# Extended shutdown timeout for long-running requests
worker_shutdown_timeout 30

# In application code
class GracefulShutdown
  def initialize(app)
    @app = app
    @shutdown_requested = false
  end
  
  def call(env)
    return [503, {}, ['Service shutting down']] if @shutdown_requested
    @app.call(env)
  end
  
  def shutdown
    @shutdown_requested = true
    # Wait for active requests to complete
    sleep 1 while active_requests > 0
  end
  
  private
  
  def active_requests
    # Track active request count
    ObjectSpace.each_object(Thread).count { |t| t[:processing_request] }
  end
end

# Signal handling
Signal.trap('TERM') do
  shutdown_handler.shutdown
  exit
end

Client-Side Load Balancing in Ruby implements distribution logic within application code for service-to-service communication.

require 'net/http'
require 'uri'

class ServiceClient
  def initialize(service_endpoints)
    @endpoints = service_endpoints
    @current_index = 0
    @mutex = Mutex.new
  end
  
  def call_service(path, method: :get, body: nil)
    attempts = 0
    max_attempts = @endpoints.size
    
    while attempts < max_attempts
      endpoint = next_endpoint
      
      begin
        return execute_request(endpoint, path, method, body)
      rescue Errno::ECONNREFUSED, Net::OpenTimeout => e
        attempts += 1
        # Try next endpoint on connection failure
      end
    end
    
    raise ServiceUnavailableError, "All endpoints failed"
  end
  
  private
  
  def next_endpoint
    @mutex.synchronize do
      endpoint = @endpoints[@current_index]
      @current_index = (@current_index + 1) % @endpoints.size
      endpoint
    end
  end
  
  def execute_request(endpoint, path, method, body)
    uri = URI.join(endpoint, path)
    
    case method
    when :get
      Net::HTTP.get_response(uri)
    when :post
      Net::HTTP.post(uri, body)
    else
      raise ArgumentError, "Unsupported method: #{method}"
    end
  end
end

# Usage
client = ServiceClient.new([
  'http://service1.internal:3000',
  'http://service2.internal:3000',
  'http://service3.internal:3000'
])

response = client.call_service('/api/users', method: :get)
# => Rotates through endpoints on each call

Session Management with Load Balancers requires coordinating session storage across distributed backends.

# Redis-backed session store for stateless load balancing
# config/initializers/session_store.rb
Rails.application.config.session_store :redis_store,
  servers: ENV['REDIS_URL'],
  expire_after: 1.day,
  key: '_myapp_session',
  threadsafe: true,
  secure: Rails.env.production?

# Cookie-based session with shared secret
Rails.application.config.session_store :cookie_store,
  key: '_myapp_session',
  secure: Rails.env.production?,
  httponly: true,
  same_site: :lax

Tools & Ecosystem

Multiple load balancing solutions serve different use cases and operational requirements.

HAProxy provides high-performance Layer 4 and Layer 7 load balancing. The software handles millions of concurrent connections with microsecond latency. Configuration uses a declarative syntax defining frontends (client-facing), backends (server pools), and routing rules.

# HAProxy configuration generation in Ruby
class HAProxyConfig
  def initialize
    @frontends = []
    @backends = []
  end
  
  def add_frontend(name, bind_address, default_backend)
    @frontends << {
      name: name,
      bind: bind_address,
      default_backend: default_backend
    }
  end
  
  def add_backend(name, servers, balance_method: 'roundrobin')
    @backends << {
      name: name,
      balance: balance_method,
      servers: servers
    }
  end
  
  def generate
    config = "global\n"
    config << "  maxconn 4096\n"
    config << "  log 127.0.0.1 local0\n\n"
    
    config << "defaults\n"
    config << "  mode http\n"
    config << "  timeout connect 5000ms\n"
    config << "  timeout client 50000ms\n"
    config << "  timeout server 50000ms\n\n"
    
    @frontends.each do |frontend|
      config << "frontend #{frontend[:name]}\n"
      config << "  bind #{frontend[:bind]}\n"
      config << "  default_backend #{frontend[:default_backend]}\n\n"
    end
    
    @backends.each do |backend|
      config << "backend #{backend[:name]}\n"
      config << "  balance #{backend[:balance]}\n"
      backend[:servers].each_with_index do |server, i|
        config << "  server server#{i + 1} #{server[:address]} check\n"
      end
      config << "\n"
    end
    
    config
  end
end

# Usage
config = HAProxyConfig.new
config.add_frontend('http_front', '*:80', 'web_servers')
config.add_backend('web_servers', [
  { address: '192.168.1.10:3000' },
  { address: '192.168.1.11:3000' },
  { address: '192.168.1.12:3000' }
], balance_method: 'leastconn')

File.write('/etc/haproxy/haproxy.cfg', config.generate)

Nginx serves as both web server and reverse proxy with load balancing capabilities. The configuration supports upstream server groups, health checks, and various distribution algorithms. Nginx Plus (commercial version) adds advanced features like dynamic reconfiguration and active health checks.

AWS Elastic Load Balancing offers three types: Application Load Balancer (Layer 7 HTTP/HTTPS), Network Load Balancer (Layer 4 TCP/UDP), and Classic Load Balancer (legacy). Application Load Balancers route based on request content, support WebSockets, and integrate with AWS services. Network Load Balancers handle millions of requests per second with ultra-low latency.

Envoy provides a modern, cloud-native proxy designed for service mesh architectures. The software supports dynamic configuration updates, advanced routing, observability features, and extension through filters. Envoy forms the data plane for service meshes like Istio and Consul Connect.

Kong combines API gateway functionality with load balancing, rate limiting, authentication, and request transformation. Ruby plugins extend Kong functionality, though the core uses OpenResty (Nginx + Lua).

Ruby Gems for Load Balancer Integration:

# Using faraday with retry and circuit breaker
require 'faraday'
require 'faraday_middleware'

conn = Faraday.new do |f|
  f.request :retry, max: 3, interval: 0.5, backoff_factor: 2
  f.adapter Faraday.default_adapter
end

# Define multiple backend URLs
backends = [
  'http://api1.example.com',
  'http://api2.example.com',
  'http://api3.example.com'
]

# Simple round-robin selection
@current_backend ||= 0
backend_url = backends[@current_backend % backends.length]
@current_backend += 1

response = conn.get("#{backend_url}/api/resource")

Træfik automatically discovers services in containerized environments through integration with Docker, Kubernetes, and other orchestrators. Configuration updates occur automatically as services scale, eliminating manual load balancer reconfiguration.

Real-World Applications

Production load balancing architectures demonstrate patterns for scalability, reliability, and operational efficiency.

Multi-Tier Load Balancing layers multiple load balancing levels for different purposes. DNS distributes traffic across geographic regions. Regional load balancers route to availability zones. Zone load balancers distribute across server racks. This hierarchy provides redundancy at each level while isolating failures.

# Simulating multi-tier routing decision
class MultiTierRouter
  def initialize(topology)
    @topology = topology
  end
  
  def route_request(client_ip, request)
    region = select_region(client_ip)
    zone = select_zone(region)
    rack = select_rack(zone)
    server = select_server(rack)
    
    {
      path: [region, zone, rack, server],
      selected_server: server
    }
  end
  
  private
  
  def select_region(client_ip)
    # Geographic routing based on client location
    @topology[:regions].min_by { |r| latency(client_ip, r) }
  end
  
  def select_zone(region)
    # Zone with available capacity
    region[:zones].select { |z| z[:available_capacity] > 0 }.sample
  end
  
  def select_rack(zone)
    # Least connections algorithm
    zone[:racks].min_by { |r| r[:active_connections] }
  end
  
  def select_server(rack)
    # Weighted round-robin based on server capacity
    weighted_selection(rack[:servers])
  end
  
  def weighted_selection(servers)
    total_weight = servers.sum { |s| s[:weight] }
    random_value = rand(total_weight)
    
    cumulative = 0
    servers.each do |server|
      cumulative += server[:weight]
      return server if random_value < cumulative
    end
  end
  
  def latency(client_ip, region)
    # Simplified latency calculation
    # Production systems use actual latency measurements
    distance(client_ip, region[:location])
  end
end

Blue-Green Deployments use load balancers to switch traffic between application versions. The blue environment serves production traffic while green deploys the new version. After validation, the load balancer shifts traffic to green. Rollback occurs instantly by routing back to blue.

Canary Releases route a small percentage of traffic to new application versions for validation before full rollout. Load balancers split traffic proportionally, sending 5-10% to the canary version while monitoring error rates and performance. Gradual traffic increase follows successful canary validation.

# Canary deployment controller
class CanaryDeployment
  def initialize(stable_backend, canary_backend)
    @stable_backend = stable_backend
    @canary_backend = canary_backend
    @canary_percentage = 0
    @error_threshold = 0.05  # 5% error rate
  end
  
  def route_request(request)
    backend = select_backend
    
    begin
      response = backend.handle(request)
      record_success(backend)
      response
    rescue => e
      record_failure(backend)
      raise
    end
  end
  
  def increase_canary_traffic(increment = 10)
    return if @canary_percentage >= 100
    
    if canary_error_rate < @error_threshold
      @canary_percentage = [@canary_percentage + increment, 100].min
      puts "Increased canary traffic to #{@canary_percentage}%"
    else
      puts "Canary error rate too high, holding at #{@canary_percentage}%"
    end
  end
  
  def rollback
    @canary_percentage = 0
    puts "Rolled back to stable backend"
  end
  
  private
  
  def select_backend
    rand(100) < @canary_percentage ? @canary_backend : @stable_backend
  end
  
  def record_success(backend)
    backend.stats[:successes] += 1
  end
  
  def record_failure(backend)
    backend.stats[:failures] += 1
  end
  
  def canary_error_rate
    stats = @canary_backend.stats
    total = stats[:successes] + stats[:failures]
    return 0 if total.zero?
    
    stats[:failures].to_f / total
  end
end

Auto-Scaling Integration coordinates load balancers with dynamic server provisioning. Cloud platforms monitor load balancer metrics (CPU, connection count, request rate) and automatically launch or terminate servers. Load balancers register new servers through service discovery, adding endpoints as instances become healthy.

WebSocket Load Balancing requires session affinity since WebSocket connections maintain long-lived, stateful connections. Load balancers hash connection identifiers to ensure related requests route to the same server. Nginx and HAProxy support WebSocket proxying with appropriate upgrade header handling.

API Gateway Patterns position load balancers behind API gateways that handle authentication, rate limiting, and request transformation. The gateway fans out requests to multiple backend services, aggregating responses. Load balancers distribute requests within each service, while the gateway handles cross-service routing.

Database Connection Pooling applies load balancing concepts to database connections. PgBouncer and ProxySQL distribute queries across database replicas, maintaining connection pools to reduce connection overhead. Read queries distribute across replicas while writes route to the primary.

# Database read replica load balancing
class DatabaseLoadBalancer
  def initialize(primary, replicas)
    @primary = primary
    @replicas = replicas
    @replica_index = 0
  end
  
  def execute_query(sql)
    if write_query?(sql)
      @primary.execute(sql)
    else
      select_replica.execute(sql)
    end
  end
  
  private
  
  def write_query?(sql)
    sql.match?(/\A\s*(INSERT|UPDATE|DELETE|CREATE|ALTER|DROP)/i)
  end
  
  def select_replica
    # Round-robin with health checking
    attempts = 0
    
    while attempts < @replicas.length
      replica = @replicas[@replica_index]
      @replica_index = (@replica_index + 1) % @replicas.length
      
      return replica if replica.healthy?
      
      attempts += 1
    end
    
    # Fallback to primary if no replicas healthy
    @primary
  end
end

# Rails configuration
class ApplicationRecord < ActiveRecord::Base
  connects_to database: { 
    writing: :primary, 
    reading: :replica 
  }
end

# Automatic read/write splitting
User.where(active: true).to_a  # Routes to replica
User.create(name: 'Alice')     # Routes to primary

Reference

Load Balancing Algorithms

Algorithm	Description	Use Case	Session Affinity
Round Robin	Sequential rotation through servers	Uniform workloads, stateless apps	No
Weighted Round Robin	Rotation proportional to server capacity	Heterogeneous server specs	No
Least Connections	Routes to server with fewest connections	Variable request durations	No
Weighted Least Connections	Least connections considering capacity	Mixed capacity and variable load	No
IP Hash	Hash client IP to consistent server	Session persistence without cookies	Yes
URL Hash	Hash request URL to server	Cache optimization	Optional
Random	Random server selection	Stateless apps, simple distribution	No
Least Response Time	Routes to fastest responding server	Latency-sensitive applications	No

Health Check Types

Type	Layer	Check Method	Granularity	Overhead
TCP Connect	Layer 4	TCP handshake completion	Basic connectivity	Low
HTTP GET	Layer 7	HTTP request to endpoint	Service reachability	Medium
Custom Script	Application	Application-specific logic	Full validation	High
Passive	Application	Monitor actual traffic	Real traffic patterns	None

Load Balancer Types Comparison

Type	Layer	Latency	Throughput	Configuration Complexity	Cost Model
DNS	Application	High (TTL)	Unlimited	Low	Per domain
Hardware	Network	Microseconds	Millions req/s	High	Capital expense
Software	Network	Milliseconds	100K+ req/s	Medium	Operational
Cloud Managed	Varies	Low	Auto-scaling	Low	Pay per use
Service Mesh	Application	Low	High	High	Infrastructure

Session Persistence Methods

Method	Mechanism	Reliability	Operational Complexity
Source IP Hash	Hash client IP address	Medium (NAT issues)	Low
Cookie Insertion	Load balancer sets cookie	High	Low
Application Cookie	Application manages session ID	High	Medium
SSL Session ID	TLS session identifier	Medium	Low
URL Parameter	Session ID in query string	High	High

Common Configuration Parameters

Parameter	Description	Typical Values	Impact
Connection Timeout	Max time for backend connection	5-30 seconds	Failure detection speed
Request Timeout	Max request processing time	30-300 seconds	Long request handling
Health Check Interval	Time between health probes	5-30 seconds	Failure detection delay
Health Check Timeout	Max health check duration	2-10 seconds	False positive rate
Unhealthy Threshold	Failed checks before removal	2-5 failures	Failure sensitivity
Healthy Threshold	Passed checks before restoration	2-5 successes	Recovery speed
Maximum Connections	Per-server connection limit	1000-10000	Overload protection
Keep-Alive Timeout	Idle connection timeout	30-300 seconds	Connection reuse

Load Balancer Selection Criteria

Criterion	Consider When	Examples
Traffic Volume	Expected request rate	Low: Nginx, High: Hardware LB
Geographic Distribution	Multi-region deployment	DNS-based, Cloud global LB
Protocol Requirements	Application protocol	HTTP: ALB, TCP: NLB
Operational Expertise	Team capabilities	Managed: Cloud LB, Control: HAProxy
Budget Constraints	Cost considerations	Open source: Nginx/HAProxy, Cloud: ELB
Latency Requirements	Response time needs	Ultra-low: Hardware, Normal: Software
SSL Termination	Certificate management	Centralized or distributed
Service Discovery	Dynamic backends	Container orchestration integration

Ruby Gems for Load Balancing

Gem	Purpose	Integration Point
rack-proxy	Reverse proxy middleware	Rack applications
net-http-persistent	Connection pooling	HTTP clients
connection_pool	Generic connection pooling	Database, cache clients
redis-rb	Redis connection pooling	Redis operations
pg	PostgreSQL connection pooling	Database connections
faraday	HTTP client with middleware	Service-to-service calls
typhoeus	Parallel HTTP requests	Multiple backend calls

Monitoring Metrics

Metric	Description	Alert Threshold
Request Rate	Requests per second	Sudden spikes or drops
Error Rate	Failed request percentage	Above 1-5%
Response Time	Average request latency	P95 exceeds SLA
Active Connections	Current connections	Near connection limit
Backend Health	Available server count	Below redundancy level
Queue Depth	Pending requests	Sustained non-zero
SSL Negotiation Time	TLS handshake duration	Exceeds 100ms
Connection Errors	Failed backend connections	Above baseline

Load Balancing