Overview
Load balancing distributes incoming requests across multiple backend servers or resources to prevent any single server from becoming a bottleneck. The technique originated in the 1990s as web traffic scaled beyond single-server capacity, requiring mechanisms to spread load across server farms.
A load balancer acts as a reverse proxy, sitting between clients and backend servers. When a client sends a request, the load balancer applies an algorithm to select which backend server receives the request. The selected server processes the request and returns the response through the load balancer to the client.
Load balancing serves several functions:
- Prevents server overload by distributing requests
- Increases application availability through redundancy
- Enables horizontal scaling by adding servers
- Facilitates zero-downtime deployments
- Provides failure isolation and automatic failover
Modern applications rarely run on single servers. Load balancing has become fundamental infrastructure, appearing at multiple layers: DNS-level geographic distribution, network-level TCP/UDP balancing, and application-level HTTP routing. Cloud platforms provide managed load balancing services, while on-premises deployments use dedicated hardware or software solutions.
# Conceptual load balancer behavior
class SimpleLoadBalancer
def initialize(servers)
@servers = servers
@current_index = 0
end
def forward_request(request)
server = select_server
server.handle(request)
end
def select_server
# Round-robin algorithm
server = @servers[@current_index]
@current_index = (@current_index + 1) % @servers.length
server
end
end
# Usage
servers = [Server.new('server1'), Server.new('server2'), Server.new('server3')]
lb = SimpleLoadBalancer.new(servers)
lb.forward_request(request)
# => Forwards to server1, then server2, then server3, cycling back
The example shows the basic concept: the load balancer maintains a pool of servers and applies an algorithm to distribute requests. Production implementations handle health checks, connection pooling, SSL termination, and session persistence.
Key Principles
Load balancing operates on several fundamental principles that determine effectiveness and behavior.
Distribution Algorithms determine how requests map to backend servers. The algorithm choice affects load distribution, statefulness requirements, and operational complexity. Common algorithms include round-robin (sequential rotation), least connections (fewest active connections), weighted distribution (proportional to server capacity), and hash-based routing (consistent assignment based on request attributes).
Health Checking monitors backend server availability. Load balancers periodically probe servers using TCP connections, HTTP requests, or custom health endpoints. Failed health checks remove servers from rotation; successful checks restore them. Health check frequency, timeout values, and failure thresholds affect failover speed versus false positive rate.
Session Persistence maintains client-server affinity when application state exists on backend servers. Techniques include source IP hashing, cookie-based routing, or application-level session identifiers. Persistence conflicts with optimal load distribution since it constrains routing decisions.
Connection Management handles network connections between clients and servers. Load balancers may proxy connections (terminating client connections and creating new backend connections) or pass through connections (forwarding packets without termination). Proxy mode enables protocol translation, SSL offloading, and content inspection. Pass-through mode reduces latency and resource usage.
Layer 4 vs Layer 7 distinguishes network-level from application-level load balancing. Layer 4 (transport) operates on TCP/UDP packets, routing based on IP addresses and ports. Layer 7 (application) parses HTTP requests, routing based on URLs, headers, or request content. Layer 7 provides finer control but higher processing overhead.
# Health check implementation
class HealthChecker
def initialize(server, check_interval: 5, timeout: 2)
@server = server
@check_interval = check_interval
@timeout = timeout
@healthy = true
end
def start_monitoring
Thread.new do
loop do
@healthy = perform_health_check
sleep @check_interval
end
end
end
def healthy?
@healthy
end
private
def perform_health_check
Timeout.timeout(@timeout) do
response = Net::HTTP.get_response(@server.health_uri)
response.code == '200'
end
rescue Timeout::Error, StandardError
false
end
end
Stateless vs Stateful describes whether the load balancer maintains request context. Stateless balancers make independent routing decisions per request, simplifying horizontal scaling. Stateful balancers track connections, sessions, or request sequences, enabling advanced features like transaction integrity but requiring state synchronization across load balancer instances.
Active Health Checks proactively test server availability versus passive health checks that detect failures from failed requests. Active checks provide faster failure detection; passive checks avoid probe traffic overhead. Many systems combine both approaches.
Implementation Approaches
Load balancing implementations vary by architectural layer, deployment model, and operational requirements.
DNS-Based Load Balancing returns multiple IP addresses for a domain name, distributing clients across servers through DNS resolution. The approach requires no dedicated load balancer infrastructure and provides geographic distribution. DNS caching limits failover speed since clients cache resolved addresses. TTL values balance caching efficiency against failover responsiveness.
# DNS-based routing simulation
require 'resolv'
class DNSLoadBalancer
def initialize(domain, servers)
@domain = domain
@servers = servers
@dns_cache = {}
end
def resolve(client_ip)
# Simulate geographic routing
region = determine_region(client_ip)
regional_servers = @servers.select { |s| s.region == region }
# Return multiple A records
regional_servers.sample(3).map(&:ip_address)
end
def determine_region(ip)
# Simplified geographic lookup
ip.start_with?('192.168') ? :us_west : :us_east
end
end
Hardware Load Balancers use dedicated network appliances with specialized ASICs for high-throughput packet processing. Hardware solutions handle millions of connections per second with microsecond latency. Cost and inflexibility limit adoption to large-scale deployments with predictable capacity requirements.
Software Load Balancers run on commodity servers, providing flexibility and cost efficiency. Popular options include HAProxy, Nginx, and Envoy. Software balancers handle hundreds of thousands of connections on standard hardware. Configuration changes deploy without hardware replacement, enabling rapid iteration.
Cloud Load Balancers provide managed services that abstract infrastructure management. AWS Elastic Load Balancing, Google Cloud Load Balancing, and Azure Load Balancer handle provisioning, scaling, and maintenance. Cloud balancers integrate with auto-scaling, service discovery, and health monitoring. Trade-offs include vendor lock-in and reduced low-level control.
Application-Level Load Balancing embeds distribution logic within application code or frameworks. Client-side load balancing libraries select endpoints directly, eliminating dedicated load balancer infrastructure. Service mesh architectures like Istio provide sidecar proxies that handle routing, retries, and circuit breaking at the application layer.
# Client-side load balancing with retries
class ClientSideLoadBalancer
def initialize(endpoints)
@endpoints = endpoints
@circuit_breakers = Hash.new { |h, k| h[k] = CircuitBreaker.new }
end
def call(request, max_retries: 3)
attempts = 0
while attempts < max_retries
endpoint = select_endpoint
circuit_breaker = @circuit_breakers[endpoint]
next if circuit_breaker.open?
begin
response = execute_request(endpoint, request)
circuit_breaker.record_success
return response
rescue RequestError => e
circuit_breaker.record_failure
attempts += 1
end
end
raise MaxRetriesExceeded
end
private
def select_endpoint
available = @endpoints.reject { |e| @circuit_breakers[e].open? }
available.min_by { |e| @circuit_breakers[e].failure_rate }
end
def execute_request(endpoint, request)
# HTTP request implementation
end
end
Container Orchestration Load Balancing integrates with platforms like Kubernetes. Service definitions create virtual IP addresses that distribute traffic across pod replicas. Ingress controllers route external traffic to services based on hostnames and paths. Container platforms handle service discovery, health checking, and automatic endpoint updates as pods scale.
Design Considerations
Selecting load balancing strategies requires analyzing application characteristics, operational requirements, and infrastructure constraints.
Algorithm Selection depends on workload patterns. Round-robin works well for homogeneous servers with similar request processing times. Least connections suits workloads with highly variable request durations, preventing long-running requests from concentrating on specific servers. IP hash maintains session affinity but distributes poorly when client populations cluster in IP ranges. Random selection provides good distribution with minimal state overhead.
Stateless Applications simplify load balancing by allowing requests to route to any server. Stateless design moves session data to external stores like Redis or databases, enabling load balancers to use any algorithm without session persistence concerns. Stateful applications require sticky sessions or session replication, constraining load distribution and complicating failover.
SSL/TLS Termination determines where encryption processing occurs. Terminating SSL at the load balancer centralizes certificate management and offloads CPU-intensive encryption from backend servers. Backends communicate over unencrypted connections within the trusted network. End-to-end encryption maintains SSL through to backend servers, increasing security but requiring certificate management on all servers and duplicating encryption overhead.
# Configuration modeling for SSL termination
class LoadBalancerConfig
attr_accessor :ssl_termination, :backend_protocol
def initialize
@ssl_termination = :load_balancer # or :backend or :none
@backend_protocol = :http # or :https
@backends = []
end
def ssl_termination_at_lb?
@ssl_termination == :load_balancer
end
def requires_backend_certificates?
@ssl_termination == :backend ||
(@ssl_termination == :load_balancer && @backend_protocol == :https)
end
def connection_pattern
if ssl_termination_at_lb?
"HTTPS -> LB -> HTTP -> Backends"
elsif @ssl_termination == :backend
"HTTPS -> LB (passthrough) -> HTTPS -> Backends"
else
"HTTP -> LB -> HTTP -> Backends"
end
end
end
Geographic Distribution routes users to nearby datacenters, reducing latency and improving user experience. DNS-based geographic routing resolves domains to region-specific IP addresses. Active-active multi-region deployments require data replication strategies and conflict resolution. Active-passive setups maintain cold standbys for disaster recovery, accepting longer failover times.
Health Check Design balances failure detection speed against false positive rate. Aggressive health checks (frequent probes, tight timeouts) detect failures quickly but may incorrectly remove servers during transient network issues. Conservative checks (infrequent probes, generous timeouts) reduce false positives but delay failure detection. Application-specific health endpoints verify service functionality beyond basic connectivity.
Scaling Considerations affect architecture decisions. Vertical scaling (larger servers) simplifies deployment but hits hardware limits. Horizontal scaling (more servers) provides unlimited capacity growth through load balancing but requires stateless design. Auto-scaling dynamically adjusts capacity based on metrics, requiring rapid server provisioning and deprovisioning coordination with load balancers.
Cost Trade-offs compare managed services against self-hosted solutions. Cloud load balancers eliminate operational overhead but incur per-GB and per-connection charges. Self-hosted balancers reduce variable costs but require dedicated operations teams. Hybrid approaches use cloud load balancers for edge traffic and self-hosted for internal service mesh.
Ruby Implementation
Ruby applications interact with load balancers primarily as backend servers, though Ruby can implement load balancing logic in certain contexts.
Rack Middleware for Load Balancer Integration handles forwarded headers that preserve original client information when proxied through load balancers.
# Rack middleware for X-Forwarded-For header handling
class LoadBalancerHeaders
def initialize(app)
@app = app
end
def call(env)
# Trust load balancer headers
if forwarded_for = env['HTTP_X_FORWARDED_FOR']
# Get original client IP (first in chain)
env['REMOTE_ADDR'] = forwarded_for.split(',').first.strip
end
if forwarded_proto = env['HTTP_X_FORWARDED_PROTO']
env['rack.url_scheme'] = forwarded_proto
end
if forwarded_port = env['HTTP_X_FORWARDED_PORT']
env['SERVER_PORT'] = forwarded_port
end
@app.call(env)
end
end
# In config.ru
use LoadBalancerHeaders
run MyRackApp.new
Health Check Endpoints provide application-level health verification for load balancers. Rails applications typically implement health checks as lightweight controller actions.
# Rails health check controller
class HealthController < ApplicationController
skip_before_action :verify_authenticity_token
def check
health_status = {
status: 'healthy',
database: database_healthy?,
cache: cache_healthy?,
dependencies: check_dependencies
}
if health_status.values.all? { |v| v == true || v[:status] == 'healthy' }
render json: health_status, status: :ok
else
render json: health_status, status: :service_unavailable
end
end
def shallow
# Minimal check for fast health probes
render plain: 'OK', status: :ok
end
private
def database_healthy?
ActiveRecord::Base.connection.active?
rescue
false
end
def cache_healthy?
Rails.cache.read('health_check')
true
rescue
false
end
def check_dependencies
{
redis: check_redis,
external_api: check_external_api
}
end
def check_redis
Redis.current.ping == 'PONG'
rescue
false
end
def check_external_api
# Quick timeout to avoid blocking health checks
Timeout.timeout(1) do
response = Net::HTTP.get_response(URI(ENV['EXTERNAL_API_URL']))
response.code == '200'
end
rescue
false
end
end
Graceful Shutdown ensures in-flight requests complete before server termination during deployments or scaling events.
# Puma graceful shutdown configuration
# config/puma.rb
workers 4
threads 5, 5
# Handle SIGTERM from load balancer
on_worker_shutdown do
# Stop accepting new connections
puts 'Worker shutting down, finishing requests...'
end
# Extended shutdown timeout for long-running requests
worker_shutdown_timeout 30
# In application code
class GracefulShutdown
def initialize(app)
@app = app
@shutdown_requested = false
end
def call(env)
return [503, {}, ['Service shutting down']] if @shutdown_requested
@app.call(env)
end
def shutdown
@shutdown_requested = true
# Wait for active requests to complete
sleep 1 while active_requests > 0
end
private
def active_requests
# Track active request count
ObjectSpace.each_object(Thread).count { |t| t[:processing_request] }
end
end
# Signal handling
Signal.trap('TERM') do
shutdown_handler.shutdown
exit
end
Client-Side Load Balancing in Ruby implements distribution logic within application code for service-to-service communication.
require 'net/http'
require 'uri'
class ServiceClient
def initialize(service_endpoints)
@endpoints = service_endpoints
@current_index = 0
@mutex = Mutex.new
end
def call_service(path, method: :get, body: nil)
attempts = 0
max_attempts = @endpoints.size
while attempts < max_attempts
endpoint = next_endpoint
begin
return execute_request(endpoint, path, method, body)
rescue Errno::ECONNREFUSED, Net::OpenTimeout => e
attempts += 1
# Try next endpoint on connection failure
end
end
raise ServiceUnavailableError, "All endpoints failed"
end
private
def next_endpoint
@mutex.synchronize do
endpoint = @endpoints[@current_index]
@current_index = (@current_index + 1) % @endpoints.size
endpoint
end
end
def execute_request(endpoint, path, method, body)
uri = URI.join(endpoint, path)
case method
when :get
Net::HTTP.get_response(uri)
when :post
Net::HTTP.post(uri, body)
else
raise ArgumentError, "Unsupported method: #{method}"
end
end
end
# Usage
client = ServiceClient.new([
'http://service1.internal:3000',
'http://service2.internal:3000',
'http://service3.internal:3000'
])
response = client.call_service('/api/users', method: :get)
# => Rotates through endpoints on each call
Session Management with Load Balancers requires coordinating session storage across distributed backends.
# Redis-backed session store for stateless load balancing
# config/initializers/session_store.rb
Rails.application.config.session_store :redis_store,
servers: ENV['REDIS_URL'],
expire_after: 1.day,
key: '_myapp_session',
threadsafe: true,
secure: Rails.env.production?
# Cookie-based session with shared secret
Rails.application.config.session_store :cookie_store,
key: '_myapp_session',
secure: Rails.env.production?,
httponly: true,
same_site: :lax
Tools & Ecosystem
Multiple load balancing solutions serve different use cases and operational requirements.
HAProxy provides high-performance Layer 4 and Layer 7 load balancing. The software handles millions of concurrent connections with microsecond latency. Configuration uses a declarative syntax defining frontends (client-facing), backends (server pools), and routing rules.
# HAProxy configuration generation in Ruby
class HAProxyConfig
def initialize
@frontends = []
@backends = []
end
def add_frontend(name, bind_address, default_backend)
@frontends << {
name: name,
bind: bind_address,
default_backend: default_backend
}
end
def add_backend(name, servers, balance_method: 'roundrobin')
@backends << {
name: name,
balance: balance_method,
servers: servers
}
end
def generate
config = "global\n"
config << " maxconn 4096\n"
config << " log 127.0.0.1 local0\n\n"
config << "defaults\n"
config << " mode http\n"
config << " timeout connect 5000ms\n"
config << " timeout client 50000ms\n"
config << " timeout server 50000ms\n\n"
@frontends.each do |frontend|
config << "frontend #{frontend[:name]}\n"
config << " bind #{frontend[:bind]}\n"
config << " default_backend #{frontend[:default_backend]}\n\n"
end
@backends.each do |backend|
config << "backend #{backend[:name]}\n"
config << " balance #{backend[:balance]}\n"
backend[:servers].each_with_index do |server, i|
config << " server server#{i + 1} #{server[:address]} check\n"
end
config << "\n"
end
config
end
end
# Usage
config = HAProxyConfig.new
config.add_frontend('http_front', '*:80', 'web_servers')
config.add_backend('web_servers', [
{ address: '192.168.1.10:3000' },
{ address: '192.168.1.11:3000' },
{ address: '192.168.1.12:3000' }
], balance_method: 'leastconn')
File.write('/etc/haproxy/haproxy.cfg', config.generate)
Nginx serves as both web server and reverse proxy with load balancing capabilities. The configuration supports upstream server groups, health checks, and various distribution algorithms. Nginx Plus (commercial version) adds advanced features like dynamic reconfiguration and active health checks.
AWS Elastic Load Balancing offers three types: Application Load Balancer (Layer 7 HTTP/HTTPS), Network Load Balancer (Layer 4 TCP/UDP), and Classic Load Balancer (legacy). Application Load Balancers route based on request content, support WebSockets, and integrate with AWS services. Network Load Balancers handle millions of requests per second with ultra-low latency.
Envoy provides a modern, cloud-native proxy designed for service mesh architectures. The software supports dynamic configuration updates, advanced routing, observability features, and extension through filters. Envoy forms the data plane for service meshes like Istio and Consul Connect.
Kong combines API gateway functionality with load balancing, rate limiting, authentication, and request transformation. Ruby plugins extend Kong functionality, though the core uses OpenResty (Nginx + Lua).
Ruby Gems for Load Balancer Integration:
# Using faraday with retry and circuit breaker
require 'faraday'
require 'faraday_middleware'
conn = Faraday.new do |f|
f.request :retry, max: 3, interval: 0.5, backoff_factor: 2
f.adapter Faraday.default_adapter
end
# Define multiple backend URLs
backends = [
'http://api1.example.com',
'http://api2.example.com',
'http://api3.example.com'
]
# Simple round-robin selection
@current_backend ||= 0
backend_url = backends[@current_backend % backends.length]
@current_backend += 1
response = conn.get("#{backend_url}/api/resource")
Træfik automatically discovers services in containerized environments through integration with Docker, Kubernetes, and other orchestrators. Configuration updates occur automatically as services scale, eliminating manual load balancer reconfiguration.
Real-World Applications
Production load balancing architectures demonstrate patterns for scalability, reliability, and operational efficiency.
Multi-Tier Load Balancing layers multiple load balancing levels for different purposes. DNS distributes traffic across geographic regions. Regional load balancers route to availability zones. Zone load balancers distribute across server racks. This hierarchy provides redundancy at each level while isolating failures.
# Simulating multi-tier routing decision
class MultiTierRouter
def initialize(topology)
@topology = topology
end
def route_request(client_ip, request)
region = select_region(client_ip)
zone = select_zone(region)
rack = select_rack(zone)
server = select_server(rack)
{
path: [region, zone, rack, server],
selected_server: server
}
end
private
def select_region(client_ip)
# Geographic routing based on client location
@topology[:regions].min_by { |r| latency(client_ip, r) }
end
def select_zone(region)
# Zone with available capacity
region[:zones].select { |z| z[:available_capacity] > 0 }.sample
end
def select_rack(zone)
# Least connections algorithm
zone[:racks].min_by { |r| r[:active_connections] }
end
def select_server(rack)
# Weighted round-robin based on server capacity
weighted_selection(rack[:servers])
end
def weighted_selection(servers)
total_weight = servers.sum { |s| s[:weight] }
random_value = rand(total_weight)
cumulative = 0
servers.each do |server|
cumulative += server[:weight]
return server if random_value < cumulative
end
end
def latency(client_ip, region)
# Simplified latency calculation
# Production systems use actual latency measurements
distance(client_ip, region[:location])
end
end
Blue-Green Deployments use load balancers to switch traffic between application versions. The blue environment serves production traffic while green deploys the new version. After validation, the load balancer shifts traffic to green. Rollback occurs instantly by routing back to blue.
Canary Releases route a small percentage of traffic to new application versions for validation before full rollout. Load balancers split traffic proportionally, sending 5-10% to the canary version while monitoring error rates and performance. Gradual traffic increase follows successful canary validation.
# Canary deployment controller
class CanaryDeployment
def initialize(stable_backend, canary_backend)
@stable_backend = stable_backend
@canary_backend = canary_backend
@canary_percentage = 0
@error_threshold = 0.05 # 5% error rate
end
def route_request(request)
backend = select_backend
begin
response = backend.handle(request)
record_success(backend)
response
rescue => e
record_failure(backend)
raise
end
end
def increase_canary_traffic(increment = 10)
return if @canary_percentage >= 100
if canary_error_rate < @error_threshold
@canary_percentage = [@canary_percentage + increment, 100].min
puts "Increased canary traffic to #{@canary_percentage}%"
else
puts "Canary error rate too high, holding at #{@canary_percentage}%"
end
end
def rollback
@canary_percentage = 0
puts "Rolled back to stable backend"
end
private
def select_backend
rand(100) < @canary_percentage ? @canary_backend : @stable_backend
end
def record_success(backend)
backend.stats[:successes] += 1
end
def record_failure(backend)
backend.stats[:failures] += 1
end
def canary_error_rate
stats = @canary_backend.stats
total = stats[:successes] + stats[:failures]
return 0 if total.zero?
stats[:failures].to_f / total
end
end
Auto-Scaling Integration coordinates load balancers with dynamic server provisioning. Cloud platforms monitor load balancer metrics (CPU, connection count, request rate) and automatically launch or terminate servers. Load balancers register new servers through service discovery, adding endpoints as instances become healthy.
WebSocket Load Balancing requires session affinity since WebSocket connections maintain long-lived, stateful connections. Load balancers hash connection identifiers to ensure related requests route to the same server. Nginx and HAProxy support WebSocket proxying with appropriate upgrade header handling.
API Gateway Patterns position load balancers behind API gateways that handle authentication, rate limiting, and request transformation. The gateway fans out requests to multiple backend services, aggregating responses. Load balancers distribute requests within each service, while the gateway handles cross-service routing.
Database Connection Pooling applies load balancing concepts to database connections. PgBouncer and ProxySQL distribute queries across database replicas, maintaining connection pools to reduce connection overhead. Read queries distribute across replicas while writes route to the primary.
# Database read replica load balancing
class DatabaseLoadBalancer
def initialize(primary, replicas)
@primary = primary
@replicas = replicas
@replica_index = 0
end
def execute_query(sql)
if write_query?(sql)
@primary.execute(sql)
else
select_replica.execute(sql)
end
end
private
def write_query?(sql)
sql.match?(/\A\s*(INSERT|UPDATE|DELETE|CREATE|ALTER|DROP)/i)
end
def select_replica
# Round-robin with health checking
attempts = 0
while attempts < @replicas.length
replica = @replicas[@replica_index]
@replica_index = (@replica_index + 1) % @replicas.length
return replica if replica.healthy?
attempts += 1
end
# Fallback to primary if no replicas healthy
@primary
end
end
# Rails configuration
class ApplicationRecord < ActiveRecord::Base
connects_to database: {
writing: :primary,
reading: :replica
}
end
# Automatic read/write splitting
User.where(active: true).to_a # Routes to replica
User.create(name: 'Alice') # Routes to primary
Reference
Load Balancing Algorithms
| Algorithm | Description | Use Case | Session Affinity |
|---|---|---|---|
| Round Robin | Sequential rotation through servers | Uniform workloads, stateless apps | No |
| Weighted Round Robin | Rotation proportional to server capacity | Heterogeneous server specs | No |
| Least Connections | Routes to server with fewest connections | Variable request durations | No |
| Weighted Least Connections | Least connections considering capacity | Mixed capacity and variable load | No |
| IP Hash | Hash client IP to consistent server | Session persistence without cookies | Yes |
| URL Hash | Hash request URL to server | Cache optimization | Optional |
| Random | Random server selection | Stateless apps, simple distribution | No |
| Least Response Time | Routes to fastest responding server | Latency-sensitive applications | No |
Health Check Types
| Type | Layer | Check Method | Granularity | Overhead |
|---|---|---|---|---|
| TCP Connect | Layer 4 | TCP handshake completion | Basic connectivity | Low |
| HTTP GET | Layer 7 | HTTP request to endpoint | Service reachability | Medium |
| Custom Script | Application | Application-specific logic | Full validation | High |
| Passive | Application | Monitor actual traffic | Real traffic patterns | None |
Load Balancer Types Comparison
| Type | Layer | Latency | Throughput | Configuration Complexity | Cost Model |
|---|---|---|---|---|---|
| DNS | Application | High (TTL) | Unlimited | Low | Per domain |
| Hardware | Network | Microseconds | Millions req/s | High | Capital expense |
| Software | Network | Milliseconds | 100K+ req/s | Medium | Operational |
| Cloud Managed | Varies | Low | Auto-scaling | Low | Pay per use |
| Service Mesh | Application | Low | High | High | Infrastructure |
Session Persistence Methods
| Method | Mechanism | Reliability | Operational Complexity |
|---|---|---|---|
| Source IP Hash | Hash client IP address | Medium (NAT issues) | Low |
| Cookie Insertion | Load balancer sets cookie | High | Low |
| Application Cookie | Application manages session ID | High | Medium |
| SSL Session ID | TLS session identifier | Medium | Low |
| URL Parameter | Session ID in query string | High | High |
Common Configuration Parameters
| Parameter | Description | Typical Values | Impact |
|---|---|---|---|
| Connection Timeout | Max time for backend connection | 5-30 seconds | Failure detection speed |
| Request Timeout | Max request processing time | 30-300 seconds | Long request handling |
| Health Check Interval | Time between health probes | 5-30 seconds | Failure detection delay |
| Health Check Timeout | Max health check duration | 2-10 seconds | False positive rate |
| Unhealthy Threshold | Failed checks before removal | 2-5 failures | Failure sensitivity |
| Healthy Threshold | Passed checks before restoration | 2-5 successes | Recovery speed |
| Maximum Connections | Per-server connection limit | 1000-10000 | Overload protection |
| Keep-Alive Timeout | Idle connection timeout | 30-300 seconds | Connection reuse |
Load Balancer Selection Criteria
| Criterion | Consider When | Examples |
|---|---|---|
| Traffic Volume | Expected request rate | Low: Nginx, High: Hardware LB |
| Geographic Distribution | Multi-region deployment | DNS-based, Cloud global LB |
| Protocol Requirements | Application protocol | HTTP: ALB, TCP: NLB |
| Operational Expertise | Team capabilities | Managed: Cloud LB, Control: HAProxy |
| Budget Constraints | Cost considerations | Open source: Nginx/HAProxy, Cloud: ELB |
| Latency Requirements | Response time needs | Ultra-low: Hardware, Normal: Software |
| SSL Termination | Certificate management | Centralized or distributed |
| Service Discovery | Dynamic backends | Container orchestration integration |
Ruby Gems for Load Balancing
| Gem | Purpose | Integration Point |
|---|---|---|
| rack-proxy | Reverse proxy middleware | Rack applications |
| net-http-persistent | Connection pooling | HTTP clients |
| connection_pool | Generic connection pooling | Database, cache clients |
| redis-rb | Redis connection pooling | Redis operations |
| pg | PostgreSQL connection pooling | Database connections |
| faraday | HTTP client with middleware | Service-to-service calls |
| typhoeus | Parallel HTTP requests | Multiple backend calls |
Monitoring Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
| Request Rate | Requests per second | Sudden spikes or drops |
| Error Rate | Failed request percentage | Above 1-5% |
| Response Time | Average request latency | P95 exceeds SLA |
| Active Connections | Current connections | Near connection limit |
| Backend Health | Available server count | Below redundancy level |
| Queue Depth | Pending requests | Sustained non-zero |
| SSL Negotiation Time | TLS handshake duration | Exceeds 100ms |
| Connection Errors | Failed backend connections | Above baseline |