CrackedRuby CrackedRuby

Overview

Load balancing distributes incoming requests across multiple backend servers or resources to prevent any single server from becoming a bottleneck. The technique originated in the 1990s as web traffic scaled beyond single-server capacity, requiring mechanisms to spread load across server farms.

A load balancer acts as a reverse proxy, sitting between clients and backend servers. When a client sends a request, the load balancer applies an algorithm to select which backend server receives the request. The selected server processes the request and returns the response through the load balancer to the client.

Load balancing serves several functions:

  • Prevents server overload by distributing requests
  • Increases application availability through redundancy
  • Enables horizontal scaling by adding servers
  • Facilitates zero-downtime deployments
  • Provides failure isolation and automatic failover

Modern applications rarely run on single servers. Load balancing has become fundamental infrastructure, appearing at multiple layers: DNS-level geographic distribution, network-level TCP/UDP balancing, and application-level HTTP routing. Cloud platforms provide managed load balancing services, while on-premises deployments use dedicated hardware or software solutions.

# Conceptual load balancer behavior
class SimpleLoadBalancer
  def initialize(servers)
    @servers = servers
    @current_index = 0
  end
  
  def forward_request(request)
    server = select_server
    server.handle(request)
  end
  
  def select_server
    # Round-robin algorithm
    server = @servers[@current_index]
    @current_index = (@current_index + 1) % @servers.length
    server
  end
end

# Usage
servers = [Server.new('server1'), Server.new('server2'), Server.new('server3')]
lb = SimpleLoadBalancer.new(servers)
lb.forward_request(request)
# => Forwards to server1, then server2, then server3, cycling back

The example shows the basic concept: the load balancer maintains a pool of servers and applies an algorithm to distribute requests. Production implementations handle health checks, connection pooling, SSL termination, and session persistence.

Key Principles

Load balancing operates on several fundamental principles that determine effectiveness and behavior.

Distribution Algorithms determine how requests map to backend servers. The algorithm choice affects load distribution, statefulness requirements, and operational complexity. Common algorithms include round-robin (sequential rotation), least connections (fewest active connections), weighted distribution (proportional to server capacity), and hash-based routing (consistent assignment based on request attributes).

Health Checking monitors backend server availability. Load balancers periodically probe servers using TCP connections, HTTP requests, or custom health endpoints. Failed health checks remove servers from rotation; successful checks restore them. Health check frequency, timeout values, and failure thresholds affect failover speed versus false positive rate.

Session Persistence maintains client-server affinity when application state exists on backend servers. Techniques include source IP hashing, cookie-based routing, or application-level session identifiers. Persistence conflicts with optimal load distribution since it constrains routing decisions.

Connection Management handles network connections between clients and servers. Load balancers may proxy connections (terminating client connections and creating new backend connections) or pass through connections (forwarding packets without termination). Proxy mode enables protocol translation, SSL offloading, and content inspection. Pass-through mode reduces latency and resource usage.

Layer 4 vs Layer 7 distinguishes network-level from application-level load balancing. Layer 4 (transport) operates on TCP/UDP packets, routing based on IP addresses and ports. Layer 7 (application) parses HTTP requests, routing based on URLs, headers, or request content. Layer 7 provides finer control but higher processing overhead.

# Health check implementation
class HealthChecker
  def initialize(server, check_interval: 5, timeout: 2)
    @server = server
    @check_interval = check_interval
    @timeout = timeout
    @healthy = true
  end
  
  def start_monitoring
    Thread.new do
      loop do
        @healthy = perform_health_check
        sleep @check_interval
      end
    end
  end
  
  def healthy?
    @healthy
  end
  
  private
  
  def perform_health_check
    Timeout.timeout(@timeout) do
      response = Net::HTTP.get_response(@server.health_uri)
      response.code == '200'
    end
  rescue Timeout::Error, StandardError
    false
  end
end

Stateless vs Stateful describes whether the load balancer maintains request context. Stateless balancers make independent routing decisions per request, simplifying horizontal scaling. Stateful balancers track connections, sessions, or request sequences, enabling advanced features like transaction integrity but requiring state synchronization across load balancer instances.

Active Health Checks proactively test server availability versus passive health checks that detect failures from failed requests. Active checks provide faster failure detection; passive checks avoid probe traffic overhead. Many systems combine both approaches.

Implementation Approaches

Load balancing implementations vary by architectural layer, deployment model, and operational requirements.

DNS-Based Load Balancing returns multiple IP addresses for a domain name, distributing clients across servers through DNS resolution. The approach requires no dedicated load balancer infrastructure and provides geographic distribution. DNS caching limits failover speed since clients cache resolved addresses. TTL values balance caching efficiency against failover responsiveness.

# DNS-based routing simulation
require 'resolv'

class DNSLoadBalancer
  def initialize(domain, servers)
    @domain = domain
    @servers = servers
    @dns_cache = {}
  end
  
  def resolve(client_ip)
    # Simulate geographic routing
    region = determine_region(client_ip)
    regional_servers = @servers.select { |s| s.region == region }
    
    # Return multiple A records
    regional_servers.sample(3).map(&:ip_address)
  end
  
  def determine_region(ip)
    # Simplified geographic lookup
    ip.start_with?('192.168') ? :us_west : :us_east
  end
end

Hardware Load Balancers use dedicated network appliances with specialized ASICs for high-throughput packet processing. Hardware solutions handle millions of connections per second with microsecond latency. Cost and inflexibility limit adoption to large-scale deployments with predictable capacity requirements.

Software Load Balancers run on commodity servers, providing flexibility and cost efficiency. Popular options include HAProxy, Nginx, and Envoy. Software balancers handle hundreds of thousands of connections on standard hardware. Configuration changes deploy without hardware replacement, enabling rapid iteration.

Cloud Load Balancers provide managed services that abstract infrastructure management. AWS Elastic Load Balancing, Google Cloud Load Balancing, and Azure Load Balancer handle provisioning, scaling, and maintenance. Cloud balancers integrate with auto-scaling, service discovery, and health monitoring. Trade-offs include vendor lock-in and reduced low-level control.

Application-Level Load Balancing embeds distribution logic within application code or frameworks. Client-side load balancing libraries select endpoints directly, eliminating dedicated load balancer infrastructure. Service mesh architectures like Istio provide sidecar proxies that handle routing, retries, and circuit breaking at the application layer.

# Client-side load balancing with retries
class ClientSideLoadBalancer
  def initialize(endpoints)
    @endpoints = endpoints
    @circuit_breakers = Hash.new { |h, k| h[k] = CircuitBreaker.new }
  end
  
  def call(request, max_retries: 3)
    attempts = 0
    
    while attempts < max_retries
      endpoint = select_endpoint
      circuit_breaker = @circuit_breakers[endpoint]
      
      next if circuit_breaker.open?
      
      begin
        response = execute_request(endpoint, request)
        circuit_breaker.record_success
        return response
      rescue RequestError => e
        circuit_breaker.record_failure
        attempts += 1
      end
    end
    
    raise MaxRetriesExceeded
  end
  
  private
  
  def select_endpoint
    available = @endpoints.reject { |e| @circuit_breakers[e].open? }
    available.min_by { |e| @circuit_breakers[e].failure_rate }
  end
  
  def execute_request(endpoint, request)
    # HTTP request implementation
  end
end

Container Orchestration Load Balancing integrates with platforms like Kubernetes. Service definitions create virtual IP addresses that distribute traffic across pod replicas. Ingress controllers route external traffic to services based on hostnames and paths. Container platforms handle service discovery, health checking, and automatic endpoint updates as pods scale.

Design Considerations

Selecting load balancing strategies requires analyzing application characteristics, operational requirements, and infrastructure constraints.

Algorithm Selection depends on workload patterns. Round-robin works well for homogeneous servers with similar request processing times. Least connections suits workloads with highly variable request durations, preventing long-running requests from concentrating on specific servers. IP hash maintains session affinity but distributes poorly when client populations cluster in IP ranges. Random selection provides good distribution with minimal state overhead.

Stateless Applications simplify load balancing by allowing requests to route to any server. Stateless design moves session data to external stores like Redis or databases, enabling load balancers to use any algorithm without session persistence concerns. Stateful applications require sticky sessions or session replication, constraining load distribution and complicating failover.

SSL/TLS Termination determines where encryption processing occurs. Terminating SSL at the load balancer centralizes certificate management and offloads CPU-intensive encryption from backend servers. Backends communicate over unencrypted connections within the trusted network. End-to-end encryption maintains SSL through to backend servers, increasing security but requiring certificate management on all servers and duplicating encryption overhead.

# Configuration modeling for SSL termination
class LoadBalancerConfig
  attr_accessor :ssl_termination, :backend_protocol
  
  def initialize
    @ssl_termination = :load_balancer  # or :backend or :none
    @backend_protocol = :http          # or :https
    @backends = []
  end
  
  def ssl_termination_at_lb?
    @ssl_termination == :load_balancer
  end
  
  def requires_backend_certificates?
    @ssl_termination == :backend || 
    (@ssl_termination == :load_balancer && @backend_protocol == :https)
  end
  
  def connection_pattern
    if ssl_termination_at_lb?
      "HTTPS -> LB -> HTTP -> Backends"
    elsif @ssl_termination == :backend
      "HTTPS -> LB (passthrough) -> HTTPS -> Backends"
    else
      "HTTP -> LB -> HTTP -> Backends"
    end
  end
end

Geographic Distribution routes users to nearby datacenters, reducing latency and improving user experience. DNS-based geographic routing resolves domains to region-specific IP addresses. Active-active multi-region deployments require data replication strategies and conflict resolution. Active-passive setups maintain cold standbys for disaster recovery, accepting longer failover times.

Health Check Design balances failure detection speed against false positive rate. Aggressive health checks (frequent probes, tight timeouts) detect failures quickly but may incorrectly remove servers during transient network issues. Conservative checks (infrequent probes, generous timeouts) reduce false positives but delay failure detection. Application-specific health endpoints verify service functionality beyond basic connectivity.

Scaling Considerations affect architecture decisions. Vertical scaling (larger servers) simplifies deployment but hits hardware limits. Horizontal scaling (more servers) provides unlimited capacity growth through load balancing but requires stateless design. Auto-scaling dynamically adjusts capacity based on metrics, requiring rapid server provisioning and deprovisioning coordination with load balancers.

Cost Trade-offs compare managed services against self-hosted solutions. Cloud load balancers eliminate operational overhead but incur per-GB and per-connection charges. Self-hosted balancers reduce variable costs but require dedicated operations teams. Hybrid approaches use cloud load balancers for edge traffic and self-hosted for internal service mesh.

Ruby Implementation

Ruby applications interact with load balancers primarily as backend servers, though Ruby can implement load balancing logic in certain contexts.

Rack Middleware for Load Balancer Integration handles forwarded headers that preserve original client information when proxied through load balancers.

# Rack middleware for X-Forwarded-For header handling
class LoadBalancerHeaders
  def initialize(app)
    @app = app
  end
  
  def call(env)
    # Trust load balancer headers
    if forwarded_for = env['HTTP_X_FORWARDED_FOR']
      # Get original client IP (first in chain)
      env['REMOTE_ADDR'] = forwarded_for.split(',').first.strip
    end
    
    if forwarded_proto = env['HTTP_X_FORWARDED_PROTO']
      env['rack.url_scheme'] = forwarded_proto
    end
    
    if forwarded_port = env['HTTP_X_FORWARDED_PORT']
      env['SERVER_PORT'] = forwarded_port
    end
    
    @app.call(env)
  end
end

# In config.ru
use LoadBalancerHeaders
run MyRackApp.new

Health Check Endpoints provide application-level health verification for load balancers. Rails applications typically implement health checks as lightweight controller actions.

# Rails health check controller
class HealthController < ApplicationController
  skip_before_action :verify_authenticity_token
  
  def check
    health_status = {
      status: 'healthy',
      database: database_healthy?,
      cache: cache_healthy?,
      dependencies: check_dependencies
    }
    
    if health_status.values.all? { |v| v == true || v[:status] == 'healthy' }
      render json: health_status, status: :ok
    else
      render json: health_status, status: :service_unavailable
    end
  end
  
  def shallow
    # Minimal check for fast health probes
    render plain: 'OK', status: :ok
  end
  
  private
  
  def database_healthy?
    ActiveRecord::Base.connection.active?
  rescue
    false
  end
  
  def cache_healthy?
    Rails.cache.read('health_check')
    true
  rescue
    false
  end
  
  def check_dependencies
    {
      redis: check_redis,
      external_api: check_external_api
    }
  end
  
  def check_redis
    Redis.current.ping == 'PONG'
  rescue
    false
  end
  
  def check_external_api
    # Quick timeout to avoid blocking health checks
    Timeout.timeout(1) do
      response = Net::HTTP.get_response(URI(ENV['EXTERNAL_API_URL']))
      response.code == '200'
    end
  rescue
    false
  end
end

Graceful Shutdown ensures in-flight requests complete before server termination during deployments or scaling events.

# Puma graceful shutdown configuration
# config/puma.rb
workers 4
threads 5, 5

# Handle SIGTERM from load balancer
on_worker_shutdown do
  # Stop accepting new connections
  puts 'Worker shutting down, finishing requests...'
end

# Extended shutdown timeout for long-running requests
worker_shutdown_timeout 30

# In application code
class GracefulShutdown
  def initialize(app)
    @app = app
    @shutdown_requested = false
  end
  
  def call(env)
    return [503, {}, ['Service shutting down']] if @shutdown_requested
    @app.call(env)
  end
  
  def shutdown
    @shutdown_requested = true
    # Wait for active requests to complete
    sleep 1 while active_requests > 0
  end
  
  private
  
  def active_requests
    # Track active request count
    ObjectSpace.each_object(Thread).count { |t| t[:processing_request] }
  end
end

# Signal handling
Signal.trap('TERM') do
  shutdown_handler.shutdown
  exit
end

Client-Side Load Balancing in Ruby implements distribution logic within application code for service-to-service communication.

require 'net/http'
require 'uri'

class ServiceClient
  def initialize(service_endpoints)
    @endpoints = service_endpoints
    @current_index = 0
    @mutex = Mutex.new
  end
  
  def call_service(path, method: :get, body: nil)
    attempts = 0
    max_attempts = @endpoints.size
    
    while attempts < max_attempts
      endpoint = next_endpoint
      
      begin
        return execute_request(endpoint, path, method, body)
      rescue Errno::ECONNREFUSED, Net::OpenTimeout => e
        attempts += 1
        # Try next endpoint on connection failure
      end
    end
    
    raise ServiceUnavailableError, "All endpoints failed"
  end
  
  private
  
  def next_endpoint
    @mutex.synchronize do
      endpoint = @endpoints[@current_index]
      @current_index = (@current_index + 1) % @endpoints.size
      endpoint
    end
  end
  
  def execute_request(endpoint, path, method, body)
    uri = URI.join(endpoint, path)
    
    case method
    when :get
      Net::HTTP.get_response(uri)
    when :post
      Net::HTTP.post(uri, body)
    else
      raise ArgumentError, "Unsupported method: #{method}"
    end
  end
end

# Usage
client = ServiceClient.new([
  'http://service1.internal:3000',
  'http://service2.internal:3000',
  'http://service3.internal:3000'
])

response = client.call_service('/api/users', method: :get)
# => Rotates through endpoints on each call

Session Management with Load Balancers requires coordinating session storage across distributed backends.

# Redis-backed session store for stateless load balancing
# config/initializers/session_store.rb
Rails.application.config.session_store :redis_store,
  servers: ENV['REDIS_URL'],
  expire_after: 1.day,
  key: '_myapp_session',
  threadsafe: true,
  secure: Rails.env.production?

# Cookie-based session with shared secret
Rails.application.config.session_store :cookie_store,
  key: '_myapp_session',
  secure: Rails.env.production?,
  httponly: true,
  same_site: :lax

Tools & Ecosystem

Multiple load balancing solutions serve different use cases and operational requirements.

HAProxy provides high-performance Layer 4 and Layer 7 load balancing. The software handles millions of concurrent connections with microsecond latency. Configuration uses a declarative syntax defining frontends (client-facing), backends (server pools), and routing rules.

# HAProxy configuration generation in Ruby
class HAProxyConfig
  def initialize
    @frontends = []
    @backends = []
  end
  
  def add_frontend(name, bind_address, default_backend)
    @frontends << {
      name: name,
      bind: bind_address,
      default_backend: default_backend
    }
  end
  
  def add_backend(name, servers, balance_method: 'roundrobin')
    @backends << {
      name: name,
      balance: balance_method,
      servers: servers
    }
  end
  
  def generate
    config = "global\n"
    config << "  maxconn 4096\n"
    config << "  log 127.0.0.1 local0\n\n"
    
    config << "defaults\n"
    config << "  mode http\n"
    config << "  timeout connect 5000ms\n"
    config << "  timeout client 50000ms\n"
    config << "  timeout server 50000ms\n\n"
    
    @frontends.each do |frontend|
      config << "frontend #{frontend[:name]}\n"
      config << "  bind #{frontend[:bind]}\n"
      config << "  default_backend #{frontend[:default_backend]}\n\n"
    end
    
    @backends.each do |backend|
      config << "backend #{backend[:name]}\n"
      config << "  balance #{backend[:balance]}\n"
      backend[:servers].each_with_index do |server, i|
        config << "  server server#{i + 1} #{server[:address]} check\n"
      end
      config << "\n"
    end
    
    config
  end
end

# Usage
config = HAProxyConfig.new
config.add_frontend('http_front', '*:80', 'web_servers')
config.add_backend('web_servers', [
  { address: '192.168.1.10:3000' },
  { address: '192.168.1.11:3000' },
  { address: '192.168.1.12:3000' }
], balance_method: 'leastconn')

File.write('/etc/haproxy/haproxy.cfg', config.generate)

Nginx serves as both web server and reverse proxy with load balancing capabilities. The configuration supports upstream server groups, health checks, and various distribution algorithms. Nginx Plus (commercial version) adds advanced features like dynamic reconfiguration and active health checks.

AWS Elastic Load Balancing offers three types: Application Load Balancer (Layer 7 HTTP/HTTPS), Network Load Balancer (Layer 4 TCP/UDP), and Classic Load Balancer (legacy). Application Load Balancers route based on request content, support WebSockets, and integrate with AWS services. Network Load Balancers handle millions of requests per second with ultra-low latency.

Envoy provides a modern, cloud-native proxy designed for service mesh architectures. The software supports dynamic configuration updates, advanced routing, observability features, and extension through filters. Envoy forms the data plane for service meshes like Istio and Consul Connect.

Kong combines API gateway functionality with load balancing, rate limiting, authentication, and request transformation. Ruby plugins extend Kong functionality, though the core uses OpenResty (Nginx + Lua).

Ruby Gems for Load Balancer Integration:

# Using faraday with retry and circuit breaker
require 'faraday'
require 'faraday_middleware'

conn = Faraday.new do |f|
  f.request :retry, max: 3, interval: 0.5, backoff_factor: 2
  f.adapter Faraday.default_adapter
end

# Define multiple backend URLs
backends = [
  'http://api1.example.com',
  'http://api2.example.com',
  'http://api3.example.com'
]

# Simple round-robin selection
@current_backend ||= 0
backend_url = backends[@current_backend % backends.length]
@current_backend += 1

response = conn.get("#{backend_url}/api/resource")

Træfik automatically discovers services in containerized environments through integration with Docker, Kubernetes, and other orchestrators. Configuration updates occur automatically as services scale, eliminating manual load balancer reconfiguration.

Real-World Applications

Production load balancing architectures demonstrate patterns for scalability, reliability, and operational efficiency.

Multi-Tier Load Balancing layers multiple load balancing levels for different purposes. DNS distributes traffic across geographic regions. Regional load balancers route to availability zones. Zone load balancers distribute across server racks. This hierarchy provides redundancy at each level while isolating failures.

# Simulating multi-tier routing decision
class MultiTierRouter
  def initialize(topology)
    @topology = topology
  end
  
  def route_request(client_ip, request)
    region = select_region(client_ip)
    zone = select_zone(region)
    rack = select_rack(zone)
    server = select_server(rack)
    
    {
      path: [region, zone, rack, server],
      selected_server: server
    }
  end
  
  private
  
  def select_region(client_ip)
    # Geographic routing based on client location
    @topology[:regions].min_by { |r| latency(client_ip, r) }
  end
  
  def select_zone(region)
    # Zone with available capacity
    region[:zones].select { |z| z[:available_capacity] > 0 }.sample
  end
  
  def select_rack(zone)
    # Least connections algorithm
    zone[:racks].min_by { |r| r[:active_connections] }
  end
  
  def select_server(rack)
    # Weighted round-robin based on server capacity
    weighted_selection(rack[:servers])
  end
  
  def weighted_selection(servers)
    total_weight = servers.sum { |s| s[:weight] }
    random_value = rand(total_weight)
    
    cumulative = 0
    servers.each do |server|
      cumulative += server[:weight]
      return server if random_value < cumulative
    end
  end
  
  def latency(client_ip, region)
    # Simplified latency calculation
    # Production systems use actual latency measurements
    distance(client_ip, region[:location])
  end
end

Blue-Green Deployments use load balancers to switch traffic between application versions. The blue environment serves production traffic while green deploys the new version. After validation, the load balancer shifts traffic to green. Rollback occurs instantly by routing back to blue.

Canary Releases route a small percentage of traffic to new application versions for validation before full rollout. Load balancers split traffic proportionally, sending 5-10% to the canary version while monitoring error rates and performance. Gradual traffic increase follows successful canary validation.

# Canary deployment controller
class CanaryDeployment
  def initialize(stable_backend, canary_backend)
    @stable_backend = stable_backend
    @canary_backend = canary_backend
    @canary_percentage = 0
    @error_threshold = 0.05  # 5% error rate
  end
  
  def route_request(request)
    backend = select_backend
    
    begin
      response = backend.handle(request)
      record_success(backend)
      response
    rescue => e
      record_failure(backend)
      raise
    end
  end
  
  def increase_canary_traffic(increment = 10)
    return if @canary_percentage >= 100
    
    if canary_error_rate < @error_threshold
      @canary_percentage = [@canary_percentage + increment, 100].min
      puts "Increased canary traffic to #{@canary_percentage}%"
    else
      puts "Canary error rate too high, holding at #{@canary_percentage}%"
    end
  end
  
  def rollback
    @canary_percentage = 0
    puts "Rolled back to stable backend"
  end
  
  private
  
  def select_backend
    rand(100) < @canary_percentage ? @canary_backend : @stable_backend
  end
  
  def record_success(backend)
    backend.stats[:successes] += 1
  end
  
  def record_failure(backend)
    backend.stats[:failures] += 1
  end
  
  def canary_error_rate
    stats = @canary_backend.stats
    total = stats[:successes] + stats[:failures]
    return 0 if total.zero?
    
    stats[:failures].to_f / total
  end
end

Auto-Scaling Integration coordinates load balancers with dynamic server provisioning. Cloud platforms monitor load balancer metrics (CPU, connection count, request rate) and automatically launch or terminate servers. Load balancers register new servers through service discovery, adding endpoints as instances become healthy.

WebSocket Load Balancing requires session affinity since WebSocket connections maintain long-lived, stateful connections. Load balancers hash connection identifiers to ensure related requests route to the same server. Nginx and HAProxy support WebSocket proxying with appropriate upgrade header handling.

API Gateway Patterns position load balancers behind API gateways that handle authentication, rate limiting, and request transformation. The gateway fans out requests to multiple backend services, aggregating responses. Load balancers distribute requests within each service, while the gateway handles cross-service routing.

Database Connection Pooling applies load balancing concepts to database connections. PgBouncer and ProxySQL distribute queries across database replicas, maintaining connection pools to reduce connection overhead. Read queries distribute across replicas while writes route to the primary.

# Database read replica load balancing
class DatabaseLoadBalancer
  def initialize(primary, replicas)
    @primary = primary
    @replicas = replicas
    @replica_index = 0
  end
  
  def execute_query(sql)
    if write_query?(sql)
      @primary.execute(sql)
    else
      select_replica.execute(sql)
    end
  end
  
  private
  
  def write_query?(sql)
    sql.match?(/\A\s*(INSERT|UPDATE|DELETE|CREATE|ALTER|DROP)/i)
  end
  
  def select_replica
    # Round-robin with health checking
    attempts = 0
    
    while attempts < @replicas.length
      replica = @replicas[@replica_index]
      @replica_index = (@replica_index + 1) % @replicas.length
      
      return replica if replica.healthy?
      
      attempts += 1
    end
    
    # Fallback to primary if no replicas healthy
    @primary
  end
end

# Rails configuration
class ApplicationRecord < ActiveRecord::Base
  connects_to database: { 
    writing: :primary, 
    reading: :replica 
  }
end

# Automatic read/write splitting
User.where(active: true).to_a  # Routes to replica
User.create(name: 'Alice')     # Routes to primary

Reference

Load Balancing Algorithms

Algorithm Description Use Case Session Affinity
Round Robin Sequential rotation through servers Uniform workloads, stateless apps No
Weighted Round Robin Rotation proportional to server capacity Heterogeneous server specs No
Least Connections Routes to server with fewest connections Variable request durations No
Weighted Least Connections Least connections considering capacity Mixed capacity and variable load No
IP Hash Hash client IP to consistent server Session persistence without cookies Yes
URL Hash Hash request URL to server Cache optimization Optional
Random Random server selection Stateless apps, simple distribution No
Least Response Time Routes to fastest responding server Latency-sensitive applications No

Health Check Types

Type Layer Check Method Granularity Overhead
TCP Connect Layer 4 TCP handshake completion Basic connectivity Low
HTTP GET Layer 7 HTTP request to endpoint Service reachability Medium
Custom Script Application Application-specific logic Full validation High
Passive Application Monitor actual traffic Real traffic patterns None

Load Balancer Types Comparison

Type Layer Latency Throughput Configuration Complexity Cost Model
DNS Application High (TTL) Unlimited Low Per domain
Hardware Network Microseconds Millions req/s High Capital expense
Software Network Milliseconds 100K+ req/s Medium Operational
Cloud Managed Varies Low Auto-scaling Low Pay per use
Service Mesh Application Low High High Infrastructure

Session Persistence Methods

Method Mechanism Reliability Operational Complexity
Source IP Hash Hash client IP address Medium (NAT issues) Low
Cookie Insertion Load balancer sets cookie High Low
Application Cookie Application manages session ID High Medium
SSL Session ID TLS session identifier Medium Low
URL Parameter Session ID in query string High High

Common Configuration Parameters

Parameter Description Typical Values Impact
Connection Timeout Max time for backend connection 5-30 seconds Failure detection speed
Request Timeout Max request processing time 30-300 seconds Long request handling
Health Check Interval Time between health probes 5-30 seconds Failure detection delay
Health Check Timeout Max health check duration 2-10 seconds False positive rate
Unhealthy Threshold Failed checks before removal 2-5 failures Failure sensitivity
Healthy Threshold Passed checks before restoration 2-5 successes Recovery speed
Maximum Connections Per-server connection limit 1000-10000 Overload protection
Keep-Alive Timeout Idle connection timeout 30-300 seconds Connection reuse

Load Balancer Selection Criteria

Criterion Consider When Examples
Traffic Volume Expected request rate Low: Nginx, High: Hardware LB
Geographic Distribution Multi-region deployment DNS-based, Cloud global LB
Protocol Requirements Application protocol HTTP: ALB, TCP: NLB
Operational Expertise Team capabilities Managed: Cloud LB, Control: HAProxy
Budget Constraints Cost considerations Open source: Nginx/HAProxy, Cloud: ELB
Latency Requirements Response time needs Ultra-low: Hardware, Normal: Software
SSL Termination Certificate management Centralized or distributed
Service Discovery Dynamic backends Container orchestration integration

Ruby Gems for Load Balancing

Gem Purpose Integration Point
rack-proxy Reverse proxy middleware Rack applications
net-http-persistent Connection pooling HTTP clients
connection_pool Generic connection pooling Database, cache clients
redis-rb Redis connection pooling Redis operations
pg PostgreSQL connection pooling Database connections
faraday HTTP client with middleware Service-to-service calls
typhoeus Parallel HTTP requests Multiple backend calls

Monitoring Metrics

Metric Description Alert Threshold
Request Rate Requests per second Sudden spikes or drops
Error Rate Failed request percentage Above 1-5%
Response Time Average request latency P95 exceeds SLA
Active Connections Current connections Near connection limit
Backend Health Available server count Below redundancy level
Queue Depth Pending requests Sustained non-zero
SSL Negotiation Time TLS handshake duration Exceeds 100ms
Connection Errors Failed backend connections Above baseline