CrackedRuby - Web Servers vs Application Servers

Overview

Web servers and application servers serve distinct but complementary roles in handling HTTP requests. A web server accepts HTTP requests, serves static files, and forwards dynamic requests to appropriate handlers. An application server executes application code to generate dynamic responses. The terms often cause confusion because modern servers frequently combine both capabilities, and the boundary between them has blurred with evolving architectures.

Web servers originated to serve static HTML files over HTTP. Apache HTTP Server and NCSA HTTPd pioneered this model in the 1990s. They excelled at handling many concurrent connections, serving files efficiently, and managing HTTP protocol details. As web applications grew more complex, the need for dynamic content generation led to CGI (Common Gateway Interface), which allowed web servers to execute external programs. This model proved inefficient for high-traffic applications.

Application servers emerged to address these limitations. They maintain running application processes, manage application state, provide connection pooling, and handle application-level concerns like session management and transaction coordination. J2EE application servers in the enterprise Java world exemplified this approach, managing complex application lifecycles and providing extensive middleware services.

In Ruby deployments, this distinction manifests through the separation of concerns between HTTP handling and application logic execution. A production Ruby application typically runs behind Nginx (web server) which forwards requests to Puma (application server running Rack-based Ruby code). This architecture separates protocol handling from business logic execution.

Client Request Flow:

HTTP Request → Web Server (Nginx)
                    ↓
              Static content? → Yes → Serve file directly
                    ↓ No
              Forward to Application Server (Puma)
                    ↓
              Execute Ruby Application Code
                    ↓
              Generate Dynamic Response
                    ↓
              Return through Web Server → Client

Key Principles

The fundamental distinction between web servers and application servers rests on their primary responsibilities and operational models. Web servers focus on HTTP protocol handling, connection management, and serving static resources. Application servers focus on executing application code, managing application state, and generating dynamic content.

Static vs Dynamic Content Handling

Web servers efficiently serve static content through direct file system access and memory mapping. When a request arrives for /images/logo.png, the web server reads the file from disk (or cache), adds appropriate HTTP headers, and sends the response. This operation requires no application code execution.

Dynamic content requires code execution. A request for /users/123 cannot be satisfied by reading a file—the application must query a database, apply business logic, render a view, and construct an HTTP response. This requires an application runtime environment with access to libraries, database connections, and application state.

Process and Threading Models

Web servers typically use event-driven architectures or worker pool models to handle many concurrent connections efficiently. Nginx uses an event-driven model with asynchronous I/O, allowing a single process to manage thousands of connections. Apache supports both prefork (process-per-connection) and worker (thread-per-connection) models.

Application servers must balance concurrency with thread safety. Ruby's Global Interpreter Lock (GIL) complicates threading, leading to different concurrency models. Puma uses threads within multiple worker processes. Unicorn uses forked processes without threading. Passenger supports both models. The application server must provide an execution environment where application code runs safely.

Request Lifecycle Management

A web server manages the HTTP request lifecycle: connection establishment, request parsing, header processing, request routing, response buffering, and connection termination. It handles HTTP/1.1 keep-alive connections, HTTP/2 multiplexing, SSL/TLS termination, and protocol upgrades.

An application server manages the application request lifecycle: request deserialization into application objects, middleware chain execution, routing to controller actions, session management, database transaction handling, and response serialization. It provides the Rack interface in Ruby applications, converting HTTP primitives into Ruby objects.

Resource Management

Web servers optimize for connection handling and static file serving. They use sendfile system calls to avoid copying data through userspace, maintain in-memory caches for frequently accessed files, and implement sophisticated timeout mechanisms to prevent resource exhaustion.

Application servers optimize for code execution efficiency. They maintain persistent database connection pools, cache compiled code, manage memory allocation for long-running processes, and handle graceful worker restarts under memory pressure. In Ruby, application servers must also manage gem loading, autoloading, and class reloading in development.

Security Boundaries

Web servers implement security at the HTTP protocol level: request size limits, rate limiting, IP filtering, URL filtering, and SSL/TLS configuration. They act as the first line of defense against malformed requests and common attacks.

Application servers implement security at the application level: authentication, authorization, CSRF protection, SQL injection prevention through parameterized queries, and XSS prevention through output escaping. The application server provides the environment where these security measures execute.

Separation of Concerns Benefits

Running a web server in front of an application server provides several architectural advantages. The web server buffers slow clients, preventing application server processes from blocking on network I/O. It serves static assets without touching application code, reducing application server load. It handles SSL termination, allowing application servers to focus on business logic. It provides a layer for load balancing, request routing, and graceful deployments.

Design Considerations

Choosing between different deployment architectures requires understanding the trade-offs between simplicity, performance, scalability, and operational complexity. The decision depends on application characteristics, traffic patterns, and operational requirements.

Single Application Server

Running only an application server (Puma listening directly on port 80/443) offers maximum simplicity. The application handles all requests directly without an intermediary. This approach works for low-traffic applications, internal tools, or development environments where simplicity outweighs other concerns.

This configuration exposes limitations quickly. The application server must handle slow clients, which ties up worker processes. Static assets flow through application code, consuming resources unnecessarily. SSL termination happens in the application process, complicating certificate management. No request buffering exists between clients and application processes.

Development environments commonly use this approach. Running rails server starts Puma listening on port 3000, handling all requests directly. This configuration prioritizes fast iteration over production concerns.

Web Server with Application Server

The standard production deployment places a web server (Nginx or Apache) in front of the application server. The web server listens on ports 80/443, handles SSL termination, serves static files, and proxies dynamic requests to the application server via HTTP or Unix sockets.

This architecture separates concerns effectively. Nginx excels at handling thousands of concurrent connections, buffering requests from slow clients, and serving static files. Puma excels at executing Ruby code efficiently with thread-based concurrency. Each component operates within its strengths.

Request flow optimization occurs naturally. Static assets (/assets/*, /images/*) match location blocks in Nginx configuration and serve directly from disk. Dynamic requests forward to upstream application servers. This routing happens before application code runs, saving resources.

# config/puma.rb
# Application server configuration
workers 4
threads 5, 5

# Listen on Unix socket for Nginx communication
bind 'unix:///var/run/puma.sock'

# Enable worker timeout to handle stuck requests
worker_timeout 60

preload_app!

on_worker_boot do
  ActiveRecord::Base.establish_connection
end

# /etc/nginx/sites-available/myapp
# Web server configuration
upstream app_server {
    server unix:/var/run/puma.sock fail_timeout=0;
}

server {
    listen 80;
    server_name example.com;
    root /var/www/myapp/public;

    # Serve static files directly
    location ~ ^/(assets|images|javascripts|stylesheets)/ {
        expires 1y;
        add_header Cache-Control public;
        break;
    }

    # Proxy dynamic requests
    location / {
        proxy_pass http://app_server;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_redirect off;
    }
}

Multiple Application Server Instances

Horizontal scaling requires multiple application server instances behind the web server. The web server load balances requests across available application servers, providing redundancy and increased capacity.

Nginx supports multiple upstream servers with configurable load balancing algorithms. Round-robin distributes requests evenly. Least connections sends requests to the server with fewest active connections. IP hash maintains session affinity by routing the same client to the same server.

This architecture introduces state management complexity. Session data stored in memory of one application server remains unavailable to others. Applications must use shared session stores (Redis, Memcached, database) or cookie-based sessions. Database connection pools must account for multiple processes connecting.

# config/puma.rb for scaled deployment
# Run multiple Puma instances on different ports or sockets
# Instance 1: unix:///var/run/puma1.sock
# Instance 2: unix:///var/run/puma2.sock
# Instance 3: unix:///var/run/puma3.sock

workers 4
threads 5, 5
bind "unix:///var/run/puma#{ENV['INSTANCE']}.sock"

Container-Based Architectures

Containerization with Docker changes deployment patterns while maintaining the web server/application server separation. Containers encapsulate the application server and its dependencies, while the web server often runs on the host or in a separate container.

A typical containerized Ruby deployment runs Nginx on the host, routing to multiple Puma containers. Each container runs an isolated application instance with its own process space and file system. Kubernetes orchestrates these containers, managing scaling, health checks, and rolling deployments.

This approach simplifies application server deployment while complicating networking. Container networking requires DNS-based service discovery or environment variable injection for backend addresses. Health checks must work through the container network. Log aggregation becomes essential as logs scatter across ephemeral containers.

Service-Oriented Architectures

Microservices architectures extend the web server/application server pattern across multiple services. An API gateway (often Nginx with Lua or a dedicated gateway like Kong) routes requests to appropriate service instances. Each service runs its own application server instances.

This multiplies the operational complexity of the web server/application server relationship. Each service requires health checking, circuit breaking, retry logic, and timeout management. The gateway must handle service discovery as backend addresses change. Request tracing across services requires correlation IDs flowing through all layers.

Ruby Implementation

Ruby provides multiple options for both web servers and application servers, each with distinct characteristics and trade-offs. Understanding their implementation details guides appropriate selection for specific use cases.

Rack: The Foundation

Rack defines the interface between web servers and Ruby applications. It specifies a simple contract: the application must respond to call(env), receiving an environment hash and returning an array of [status, headers, body]. This abstraction allows any Rack-compatible server to run any Rack-compatible application.

# Minimal Rack application
class HelloWorld
  def call(env)
    [
      200,
      { 'Content-Type' => 'text/plain' },
      ['Hello, World!']
    ]
  end
end

# Run with: rackup config.ru

Rails, Sinatra, and other Ruby web frameworks build on Rack. They implement the Rack interface while providing higher-level abstractions for routing, controllers, and views. This allows the same application code to run under different servers without modification.

Puma: Multi-Threaded Application Server

Puma uses a hybrid threading and process model. It spawns multiple worker processes, each running multiple threads. This architecture maximizes CPU utilization on multi-core systems while maintaining memory efficiency through copy-on-write forking.

# config/puma.rb
workers ENV.fetch("WEB_CONCURRENCY") { 4 }
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }
threads threads_count, threads_count

preload_app!

before_fork do
  ActiveRecord::Base.connection_pool.disconnect! if defined?(ActiveRecord)
end

on_worker_boot do
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

on_worker_shutdown do
  puts 'Worker shutting down gracefully'
end

# Handle SIGTERM gracefully for rolling deployments
on_restart do
  puts 'Puma restarting...'
end

Puma's threading model requires thread-safe application code. Rails 4.0+ defaults to thread-safe operation, but legacy code or certain gems may not be thread-safe. The GIL limits true parallel Ruby code execution but doesn't prevent threaded I/O concurrency—threads can make concurrent database queries or HTTP requests.

The worker killer pattern addresses memory growth. Puma workers gradually accumulate memory through fragmentation and retained objects. The puma_worker_killer gem monitors worker memory and restarts workers exceeding thresholds:

# Gemfile
gem 'puma_worker_killer'

# config/puma.rb
before_fork do
  require 'puma_worker_killer'
  
  PumaWorkerKiller.config do |config|
    config.ram = 1024 # MB
    config.frequency = 10 # seconds
    config.percent_usage = 0.90
  end
  
  PumaWorkerKiller.start
end

Unicorn: Process-Based Application Server

Unicorn uses a preforking model without threading. The master process loads the application, then forks worker processes. Each worker handles one request at a time. This model guarantees process isolation and simplifies application code—no thread safety concerns.

# config/unicorn.rb
worker_processes 4
timeout 30
preload_app true

listen "/var/run/unicorn.sock", backlog: 64

before_fork do |server, worker|
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end
  
  old_pid = "#{server.config[:pid]}.oldbin"
  if File.exist?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
    end
  end
end

after_fork do |server, worker|
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
  end
end

Unicorn excels at zero-downtime deployments. The master process responds to SIGUSR2 by spawning a new master with the updated code. The new master starts new workers while old workers finish their requests. Once complete, the old master terminates gracefully.

Memory efficiency suffers compared to threaded servers. Four Unicorn workers consume roughly 4x the memory of one Puma worker with four threads. However, process isolation prevents thread-related issues and simplifies debugging.

Passenger: Integrated Web and Application Server

Passenger operates differently—it integrates directly with Nginx or Apache as a module. This eliminates the separate application server process, with Passenger spawning Ruby processes on demand.

# Nginx with Passenger
# /etc/nginx/nginx.conf
http {
    passenger_root /usr/lib/ruby/vendor_ruby/phusion_passenger/locations.ini;
    passenger_ruby /usr/bin/ruby;
    
    server {
        listen 80;
        server_name example.com;
        root /var/www/myapp/public;
        passenger_enabled on;
        passenger_min_instances 4;
        passenger_max_pool_size 8;
    }
}

Passenger monitors application processes and restarts them based on memory limits or request counts. It spawns processes on demand when traffic increases and terminates idle processes to conserve resources. This dynamic process management works well for applications with variable traffic patterns.

WEBrick: Development Server

WEBrick ships with Ruby as the default development server. It implements both web server and application server functionality in pure Ruby. Single-threaded and slow, WEBrick suffices for development but never for production.

# Automatically used by: rails server
# Or explicit usage:
require 'webrick'
require 'rack'

server = WEBrick::HTTPServer.new(Port: 3000)
server.mount '/', Rack::Handler::WEBrick, app
trap('INT') { server.shutdown }
server.start

Falcon: Fiber-Based Application Server

Falcon uses Ruby fibers for concurrency, implementing an event-driven model similar to Node.js. Fibers provide cooperative multitasking without thread safety concerns. This approach works well for I/O-bound applications.

# config.ru with Falcon
# Run with: falcon serve
require_relative 'app'
run App

Falcon requires Ruby 3.0+ for full fiber scheduler support. Applications must use async-aware libraries (async-http, async-postgres) to benefit from fiber concurrency. Traditional blocking I/O negates the concurrency benefits.

Implementation Approaches

Deploying Ruby applications with web servers and application servers involves choosing architectures appropriate for application requirements, team capabilities, and infrastructure constraints.

Development Environment

Development prioritizes fast iteration over production concerns. The simplest approach runs the application server directly without a web server. Rails applications use bin/rails server, starting Puma on port 3000. All requests flow through the application, including static assets.

Asset compilation complicates development. Rails asset pipeline compiles Sass, CoffeeScript, and other assets on demand in development mode. This slow process benefits from disabling in local development while using precompiled assets in production.

# config/environments/development.rb
Rails.application.configure do
  config.assets.debug = true
  config.assets.compile = true
  config.assets.digest = false
  
  # Serve static files from public directory
  config.public_file_server.enabled = true
end

Single Server Deployment

Small applications with modest traffic requirements work well on a single server running both Nginx and Puma. This architecture provides production-quality request handling without multi-server complexity.

Systemd manages the application server process. The service file defines how to start, stop, and restart the application:

# /etc/systemd/system/myapp.service
[Unit]
Description=MyApp Puma Server
After=network.target

[Service]
Type=notify
User=deploy
Group=deploy
WorkingDirectory=/var/www/myapp
Environment=RAILS_ENV=production
Environment=PORT=3000

ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb
ExecReload=/bin/kill -SIGUSR1 $MAINPID

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Nginx configuration routes requests appropriately:

upstream puma {
    server unix:///var/run/puma.sock;
}

server {
    listen 80;
    server_name example.com;
    root /var/www/myapp/public;
    
    try_files $uri/index.html $uri @puma;
    
    location @puma {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass http://puma;
    }
    
    location ^~ /assets/ {
        gzip_static on;
        expires max;
        add_header Cache-Control public;
    }
}

Multi-Server Deployment

Applications exceeding single-server capacity require horizontal scaling. Multiple application servers run behind a load balancer, distributing requests and providing redundancy.

The load balancer becomes a critical component. Nginx can serve this role for modest scale, but dedicated load balancers (HAProxy, AWS ELB) provide advanced features like health checking, SSL termination, and sophisticated routing.

Session management requires attention in multi-server deployments. Cookie-based sessions work without modification. Server-side sessions need shared storage:

# config/initializers/session_store.rb
Rails.application.config.session_store :redis_store,
  servers: ["redis://localhost:6379/0/session"],
  expire_after: 90.minutes,
  key: "_myapp_session",
  secure: Rails.env.production?,
  same_site: :lax

Database connection pooling requires careful configuration. Each application server process maintains its own connection pool. Total database connections equal (servers × workers × pool size). PostgreSQL connection limits must accommodate this:

# config/database.yml
production:
  adapter: postgresql
  encoding: unicode
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000
  url: <%= ENV['DATABASE_URL'] %>

Container-Based Deployment

Docker containers package the application and its dependencies into portable units. Container orchestration systems (Kubernetes, ECS) manage deployment, scaling, and health checking.

A typical Dockerfile builds a production-ready image:

FROM ruby:3.2-slim

RUN apt-get update && apt-get install -y \
    build-essential \
    libpq-dev \
    nodejs \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY Gemfile Gemfile.lock ./
RUN bundle config set --local deployment 'true' && \
    bundle config set --local without 'development test' && \
    bundle install -j4

COPY . .

RUN bundle exec rake assets:precompile

EXPOSE 3000

CMD ["bundle", "exec", "puma", "-C", "config/puma.rb"]

Kubernetes deployment manifests define how to run containers:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 4
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 3000
        env:
        - name: RAILS_ENV
          value: production
        - name: RAILS_MAX_THREADS
          value: "5"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5

Serverless Deployment

Serverless platforms (AWS Lambda, Google Cloud Functions) abstract away server management entirely. Ruby applications can deploy to these platforms with some architectural adjustments.

AWS Lambda with API Gateway provides an event-driven model. The application server runs only when handling requests, scaling automatically with load. Lambda cold starts introduce latency, making this approach better for sporadic workloads than consistent traffic.

Packaging Rails for Lambda requires specialized gems and adapters:

# Gemfile
gem 'lamby'

# config/application.rb
config.lamby = Lamby::Config.new

The application structure changes—controllers become Lambda function handlers, and routing happens through API Gateway configuration rather than Rails routes.

Performance Considerations

Performance characteristics differ significantly between web servers and application servers, with architectural choices creating cascading effects on throughput, latency, and resource utilization.

Concurrency Models and Throughput

Thread-based concurrency (Puma) allows more concurrent requests per process but requires thread-safe code. A Puma worker with 5 threads handles 5 concurrent requests in a single process. Process-based concurrency (Unicorn) requires separate processes for each concurrent request but guarantees isolation.

The choice depends on application characteristics. I/O-bound applications benefit from threaded concurrency—while one thread waits for database or API responses, other threads execute. CPU-bound applications gain little from threading due to the GIL, which prevents parallel Ruby code execution.

Benchmarking reveals these differences:

# Simulated I/O-bound endpoint
class ApiController < ApplicationController
  def fetch_data
    # Simulate external API call
    sleep(0.1)
    render json: { data: "response" }
  end
end

Under load testing with 100 concurrent requests:

Puma (4 workers, 5 threads): ~200 requests/second
Unicorn (4 workers): ~40 requests/second
Difference: Threaded concurrency handles I/O blocking without tying up workers

Connection Handling

Web servers optimize connection handling through event-driven architectures. Nginx with its event loop handles 10,000+ concurrent connections with minimal memory. Each connection consumes a file descriptor and small buffer, not an entire thread or process.

Application servers face different constraints. Each concurrent request requires execution context—a thread or process. Memory requirements scale with concurrent requests. A single Puma thread under load might consume 50-100MB of memory depending on application complexity.

Buffering slow clients demonstrates the value of web server/application server separation. Without a web server, the application process blocks writing response data to slow clients. With Nginx buffering, the application server completes the request quickly, and Nginx handles slow client draining:

# Nginx buffering configuration
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;

Static Asset Serving

Web servers serve static files orders of magnitude faster than application servers. Nginx uses sendfile system calls, zero-copy file transmission directly from disk cache to network socket. This avoids copying data through userspace, reducing CPU usage and latency.

Application servers serving static files must read files into memory, construct HTTP responses, and write through the Rack interface. This overhead makes static asset serving through application servers wasteful:

# Benchmarking static file serving
# Nginx: ~50,000 requests/second
# Puma: ~2,000 requests/second
# Overhead: 25x slower

Asset compilation and fingerprinting in production enable aggressive caching:

# config/environments/production.rb
config.assets.compile = false
config.assets.digest = true
config.public_file_server.enabled = false

# Nginx serves from public/assets with 1-year cache
location ~* ^/assets/ {
    expires 1y;
    add_header Cache-Control "public, immutable";
}

Database Connection Pooling

Connection pooling overhead scales with worker count. Each application server process maintains a connection pool, consuming database resources. A deployment with 4 servers, 4 Puma workers per server, and pool size 5 requires 80 database connections.

Database connection limits become constraints. PostgreSQL defaults to 100 connections, quickly exhausted in multi-server deployments. PgBouncer provides connection pooling at the database level, multiplexing application connections over a smaller pool of actual database connections:

# config/database.yml with PgBouncer
production:
  adapter: postgresql
  url: <%= ENV['DATABASE_URL'] %>
  pool: 5
  # PgBouncer multiplexes these connections

Memory Management

Ruby's memory management affects application server performance. Garbage collection pauses impact request latency. Tuning GC parameters reduces pause frequency:

# config/boot.rb
# Tune Ruby GC for production
GC::Profiler.enable

# Increase heap slots to reduce GC frequency
ENV['RUBY_GC_HEAP_INIT_SLOTS'] ||= '600000'
ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] ||= '1.1'
ENV['RUBY_GC_HEAP_GROWTH_MAX_SLOTS'] ||= '100000'

Memory bloat over time requires worker restarts. Puma's phased restart replaces workers gradually:

# Trigger with: kill -USR1 <puma-pid>
# Workers restart one at a time, maintaining availability

Request Queuing

Request queues form at multiple layers. The web server queues requests waiting for available application server connections. The application server queues requests waiting for available threads or processes. Long queues indicate insufficient capacity.

Monitoring queue depth requires instrumentation:

# Monitor request queue time with rack-timeout
gem 'rack-timeout'

# config/initializers/rack_timeout.rb
Rack::Timeout.timeout = 15
Rack::Timeout.wait_timeout = 30

Rack::Timeout::Logger.level = Logger::ERROR

Queue time should remain under 100ms. Higher queue times indicate the need for additional application server instances or optimization of slow endpoints.

Tools & Ecosystem

The Ruby web server and application server ecosystem includes production-grade servers, development tools, monitoring solutions, and deployment utilities.

Production Web Servers

Nginx dominates Ruby production deployments. Its event-driven architecture, efficient static file serving, and flexible configuration make it the standard choice. Version 1.20+ includes HTTP/2 support, gRPC proxying, and dynamic modules.

Apache with mod_passenger provides an alternative. Apache's prefork and worker MPMs handle concurrency differently than Nginx, sometimes better for specific workloads. Configuration uses .htaccess files for directory-level rules.

Caddy represents a modern alternative with automatic HTTPS through Let's Encrypt, HTTP/3 support, and simpler configuration:

example.com {
    root * /var/www/myapp/public
    encode gzip
    
    @notStatic {
        not path /assets/*
        not file {path}
    }
    
    reverse_proxy @notStatic localhost:3000
}

Application Server Options

Puma remains the default Rails application server. Active development, good performance, and threading support make it suitable for most applications. Configuration flexibility accommodates various deployment scenarios.

Unicorn continues serving applications prioritizing process isolation over threading. Battle-tested in production, it handles high-traffic applications reliably. Deployment simplicity and predictable behavior benefit operations teams.

Passenger (Enterprise) adds advanced features: rolling restarts, request/response buffering, detailed analytics, and resistance to memory bloat. Commercial support justifies the license cost for critical applications.

Falcon brings async/await patterns to Ruby with fiber-based concurrency. Applications must adopt async-compatible libraries, limiting adoption but offering performance benefits for I/O-intensive workloads.

Monitoring and Instrumentation

New Relic provides comprehensive application performance monitoring. It tracks request throughput, response times, database query performance, and error rates. Ruby agent installation requires minimal configuration:

# Gemfile
gem 'newrelic_rpm'

# config/newrelic.yml
production:
  license_key: <%= ENV['NEW_RELIC_LICENSE_KEY'] %>
  app_name: My Application
  monitor_mode: true
  developer_mode: false

Scout APM focuses on N+1 query detection and memory bloat. It identifies problematic endpoints and suggests optimizations. Integration mirrors New Relic with agent installation and configuration.

Skylight specializes in Rails applications with detailed timeline views showing time spent in views, database queries, and external services. The dashboard highlights endpoints consuming the most resources.

Process Management

Systemd manages application server processes on Linux systems. Service files define startup behavior, resource limits, and restart policies. Socket activation allows on-demand process spawning.

Foreman generates process management configurations from Procfiles. It creates systemd, upstart, or traditional init scripts:

# Procfile
web: bundle exec puma -C config/puma.rb
worker: bundle exec sidekiq -C config/sidekiq.yml

Capistrano automates deployment, handling code updates, asset compilation, and service restarts:

# Capfile
require 'capistrano/bundler'
require 'capistrano/rails'
require 'capistrano/puma'

# config/deploy.rb
set :application, 'myapp'
set :repo_url, 'git@github.com:user/myapp.git'
set :deploy_to, '/var/www/myapp'
set :puma_threads, [4, 16]
set :puma_workers, 4

Load Testing Tools

Apache Bench (ab) provides basic load testing. It measures throughput and latency under concurrent load but lacks advanced features:

ab -n 10000 -c 100 http://localhost:3000/

wrk offers scriptable load testing with Lua. It generates detailed latency distributions and supports complex request patterns:

wrk -t12 -c400 -d30s http://localhost:3000/

Siege simulates realistic user behavior with configurable think time between requests:

siege -c 50 -t 60s http://localhost:3000/

SSL/TLS Management

Let's Encrypt provides free SSL certificates with automated renewal. Certbot handles certificate issuance and renewal:

certbot certonly --webroot -w /var/www/myapp/public -d example.com

Nginx configuration enables HTTPS with proper security settings:

server {
    listen 443 ssl http2;
    server_name example.com;
    
    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;
    
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 10m;
}

Reference

Web Server Comparison

Feature	Nginx	Apache	Caddy
Architecture	Event-driven	Prefork or Worker MPM	Event-driven
Configuration	nginx.conf with blocks	httpd.conf with .htaccess	Caddyfile
Static file serving	Excellent (sendfile)	Good	Good
Memory footprint	Very low	Moderate	Low
HTTP/2 support	Yes (1.20+)	Yes (2.4.17+)	Yes
HTTP/3 support	Experimental	No	Yes
Auto HTTPS	No	No	Yes (built-in)

Application Server Comparison

Server	Concurrency Model	Thread Safety Required	Memory Usage	Zero-Downtime Deploys
Puma	Multi-process + Multi-thread	Yes	Moderate	Yes (phased restart)
Unicorn	Multi-process	No	High	Yes (graceful restart)
Passenger	Multi-process or Multi-thread	Configurable	Moderate	Yes
Falcon	Fiber-based	No	Low	Partial
WEBrick	Single-threaded	N/A	Low	No

Puma Configuration Reference

Setting	Purpose	Typical Value
workers	Number of processes	4
threads min, max	Thread pool size per worker	5, 5
preload_app	Load app before forking	true
worker_timeout	Request timeout in seconds	60
worker_boot_timeout	Worker startup timeout	30
bind	Socket or port binding	unix:///var/run/puma.sock

Unicorn Configuration Reference

Setting	Purpose	Typical Value
worker_processes	Number of worker processes	4
timeout	Request timeout in seconds	30
listen	Socket or port binding	/var/run/unicorn.sock
preload_app	Load app before forking	true
stderr_path	Error log location	log/unicorn_stderr.log
stdout_path	Output log location	log/unicorn_stdout.log

Nginx Proxy Configuration

Directive	Purpose	Example
proxy_pass	Upstream server URL	http://puma
proxy_set_header	Set request header	X-Forwarded-For
proxy_buffering	Enable response buffering	on
proxy_buffer_size	Initial buffer size	4k
proxy_buffers	Number and size of buffers	8 4k
proxy_connect_timeout	Backend connection timeout	60s
proxy_read_timeout	Backend read timeout	60s

Common Signals

Signal	Puma Effect	Unicorn Effect
USR1	Phased restart (rolling)	Reopen log files
USR2	Restart all workers	Start new master process
TERM	Quick shutdown	Quick shutdown
QUIT	Graceful shutdown	Graceful shutdown
TTIN	Increase worker count	Increase worker count
TTOU	Decrease worker count	Decrease worker count

Load Balancing Algorithms

Algorithm	Nginx Directive	Behavior
Round Robin	(default)	Distributes requests sequentially
Least Connections	least_conn	Routes to server with fewest connections
IP Hash	ip_hash	Same client always routes to same server
Generic Hash	hash $variable	Custom hash-based routing
Random	random	Random server selection
Least Time	least_time	Routes to server with lowest response time

Performance Tuning Checklist

Component	Optimization	Impact
Nginx	Enable gzip compression	Reduced bandwidth usage
Nginx	Increase worker_connections	Higher concurrent connections
Nginx	Enable sendfile	Faster static file serving
Puma	Tune thread count	Better concurrency for I/O-bound apps
Puma	Tune worker count	Better CPU utilization
Application	Enable caching (Redis/Memcached)	Reduced database load
Application	Optimize database queries	Lower response times
Application	Precompile assets	Faster load times
Database	Connection pooling	Better connection reuse
Database	Query optimization	Reduced database CPU

Web Servers vs Application Servers