Overview
Web servers and application servers serve distinct but complementary roles in handling HTTP requests. A web server accepts HTTP requests, serves static files, and forwards dynamic requests to appropriate handlers. An application server executes application code to generate dynamic responses. The terms often cause confusion because modern servers frequently combine both capabilities, and the boundary between them has blurred with evolving architectures.
Web servers originated to serve static HTML files over HTTP. Apache HTTP Server and NCSA HTTPd pioneered this model in the 1990s. They excelled at handling many concurrent connections, serving files efficiently, and managing HTTP protocol details. As web applications grew more complex, the need for dynamic content generation led to CGI (Common Gateway Interface), which allowed web servers to execute external programs. This model proved inefficient for high-traffic applications.
Application servers emerged to address these limitations. They maintain running application processes, manage application state, provide connection pooling, and handle application-level concerns like session management and transaction coordination. J2EE application servers in the enterprise Java world exemplified this approach, managing complex application lifecycles and providing extensive middleware services.
In Ruby deployments, this distinction manifests through the separation of concerns between HTTP handling and application logic execution. A production Ruby application typically runs behind Nginx (web server) which forwards requests to Puma (application server running Rack-based Ruby code). This architecture separates protocol handling from business logic execution.
Client Request Flow:
HTTP Request → Web Server (Nginx)
↓
Static content? → Yes → Serve file directly
↓ No
Forward to Application Server (Puma)
↓
Execute Ruby Application Code
↓
Generate Dynamic Response
↓
Return through Web Server → Client
Key Principles
The fundamental distinction between web servers and application servers rests on their primary responsibilities and operational models. Web servers focus on HTTP protocol handling, connection management, and serving static resources. Application servers focus on executing application code, managing application state, and generating dynamic content.
Static vs Dynamic Content Handling
Web servers efficiently serve static content through direct file system access and memory mapping. When a request arrives for /images/logo.png, the web server reads the file from disk (or cache), adds appropriate HTTP headers, and sends the response. This operation requires no application code execution.
Dynamic content requires code execution. A request for /users/123 cannot be satisfied by reading a file—the application must query a database, apply business logic, render a view, and construct an HTTP response. This requires an application runtime environment with access to libraries, database connections, and application state.
Process and Threading Models
Web servers typically use event-driven architectures or worker pool models to handle many concurrent connections efficiently. Nginx uses an event-driven model with asynchronous I/O, allowing a single process to manage thousands of connections. Apache supports both prefork (process-per-connection) and worker (thread-per-connection) models.
Application servers must balance concurrency with thread safety. Ruby's Global Interpreter Lock (GIL) complicates threading, leading to different concurrency models. Puma uses threads within multiple worker processes. Unicorn uses forked processes without threading. Passenger supports both models. The application server must provide an execution environment where application code runs safely.
Request Lifecycle Management
A web server manages the HTTP request lifecycle: connection establishment, request parsing, header processing, request routing, response buffering, and connection termination. It handles HTTP/1.1 keep-alive connections, HTTP/2 multiplexing, SSL/TLS termination, and protocol upgrades.
An application server manages the application request lifecycle: request deserialization into application objects, middleware chain execution, routing to controller actions, session management, database transaction handling, and response serialization. It provides the Rack interface in Ruby applications, converting HTTP primitives into Ruby objects.
Resource Management
Web servers optimize for connection handling and static file serving. They use sendfile system calls to avoid copying data through userspace, maintain in-memory caches for frequently accessed files, and implement sophisticated timeout mechanisms to prevent resource exhaustion.
Application servers optimize for code execution efficiency. They maintain persistent database connection pools, cache compiled code, manage memory allocation for long-running processes, and handle graceful worker restarts under memory pressure. In Ruby, application servers must also manage gem loading, autoloading, and class reloading in development.
Security Boundaries
Web servers implement security at the HTTP protocol level: request size limits, rate limiting, IP filtering, URL filtering, and SSL/TLS configuration. They act as the first line of defense against malformed requests and common attacks.
Application servers implement security at the application level: authentication, authorization, CSRF protection, SQL injection prevention through parameterized queries, and XSS prevention through output escaping. The application server provides the environment where these security measures execute.
Separation of Concerns Benefits
Running a web server in front of an application server provides several architectural advantages. The web server buffers slow clients, preventing application server processes from blocking on network I/O. It serves static assets without touching application code, reducing application server load. It handles SSL termination, allowing application servers to focus on business logic. It provides a layer for load balancing, request routing, and graceful deployments.
Design Considerations
Choosing between different deployment architectures requires understanding the trade-offs between simplicity, performance, scalability, and operational complexity. The decision depends on application characteristics, traffic patterns, and operational requirements.
Single Application Server
Running only an application server (Puma listening directly on port 80/443) offers maximum simplicity. The application handles all requests directly without an intermediary. This approach works for low-traffic applications, internal tools, or development environments where simplicity outweighs other concerns.
This configuration exposes limitations quickly. The application server must handle slow clients, which ties up worker processes. Static assets flow through application code, consuming resources unnecessarily. SSL termination happens in the application process, complicating certificate management. No request buffering exists between clients and application processes.
Development environments commonly use this approach. Running rails server starts Puma listening on port 3000, handling all requests directly. This configuration prioritizes fast iteration over production concerns.
Web Server with Application Server
The standard production deployment places a web server (Nginx or Apache) in front of the application server. The web server listens on ports 80/443, handles SSL termination, serves static files, and proxies dynamic requests to the application server via HTTP or Unix sockets.
This architecture separates concerns effectively. Nginx excels at handling thousands of concurrent connections, buffering requests from slow clients, and serving static files. Puma excels at executing Ruby code efficiently with thread-based concurrency. Each component operates within its strengths.
Request flow optimization occurs naturally. Static assets (/assets/*, /images/*) match location blocks in Nginx configuration and serve directly from disk. Dynamic requests forward to upstream application servers. This routing happens before application code runs, saving resources.
# config/puma.rb
# Application server configuration
workers 4
threads 5, 5
# Listen on Unix socket for Nginx communication
bind 'unix:///var/run/puma.sock'
# Enable worker timeout to handle stuck requests
worker_timeout 60
preload_app!
on_worker_boot do
ActiveRecord::Base.establish_connection
end
# /etc/nginx/sites-available/myapp
# Web server configuration
upstream app_server {
server unix:/var/run/puma.sock fail_timeout=0;
}
server {
listen 80;
server_name example.com;
root /var/www/myapp/public;
# Serve static files directly
location ~ ^/(assets|images|javascripts|stylesheets)/ {
expires 1y;
add_header Cache-Control public;
break;
}
# Proxy dynamic requests
location / {
proxy_pass http://app_server;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_redirect off;
}
}
Multiple Application Server Instances
Horizontal scaling requires multiple application server instances behind the web server. The web server load balances requests across available application servers, providing redundancy and increased capacity.
Nginx supports multiple upstream servers with configurable load balancing algorithms. Round-robin distributes requests evenly. Least connections sends requests to the server with fewest active connections. IP hash maintains session affinity by routing the same client to the same server.
This architecture introduces state management complexity. Session data stored in memory of one application server remains unavailable to others. Applications must use shared session stores (Redis, Memcached, database) or cookie-based sessions. Database connection pools must account for multiple processes connecting.
# config/puma.rb for scaled deployment
# Run multiple Puma instances on different ports or sockets
# Instance 1: unix:///var/run/puma1.sock
# Instance 2: unix:///var/run/puma2.sock
# Instance 3: unix:///var/run/puma3.sock
workers 4
threads 5, 5
bind "unix:///var/run/puma#{ENV['INSTANCE']}.sock"
Container-Based Architectures
Containerization with Docker changes deployment patterns while maintaining the web server/application server separation. Containers encapsulate the application server and its dependencies, while the web server often runs on the host or in a separate container.
A typical containerized Ruby deployment runs Nginx on the host, routing to multiple Puma containers. Each container runs an isolated application instance with its own process space and file system. Kubernetes orchestrates these containers, managing scaling, health checks, and rolling deployments.
This approach simplifies application server deployment while complicating networking. Container networking requires DNS-based service discovery or environment variable injection for backend addresses. Health checks must work through the container network. Log aggregation becomes essential as logs scatter across ephemeral containers.
Service-Oriented Architectures
Microservices architectures extend the web server/application server pattern across multiple services. An API gateway (often Nginx with Lua or a dedicated gateway like Kong) routes requests to appropriate service instances. Each service runs its own application server instances.
This multiplies the operational complexity of the web server/application server relationship. Each service requires health checking, circuit breaking, retry logic, and timeout management. The gateway must handle service discovery as backend addresses change. Request tracing across services requires correlation IDs flowing through all layers.
Ruby Implementation
Ruby provides multiple options for both web servers and application servers, each with distinct characteristics and trade-offs. Understanding their implementation details guides appropriate selection for specific use cases.
Rack: The Foundation
Rack defines the interface between web servers and Ruby applications. It specifies a simple contract: the application must respond to call(env), receiving an environment hash and returning an array of [status, headers, body]. This abstraction allows any Rack-compatible server to run any Rack-compatible application.
# Minimal Rack application
class HelloWorld
def call(env)
[
200,
{ 'Content-Type' => 'text/plain' },
['Hello, World!']
]
end
end
# Run with: rackup config.ru
Rails, Sinatra, and other Ruby web frameworks build on Rack. They implement the Rack interface while providing higher-level abstractions for routing, controllers, and views. This allows the same application code to run under different servers without modification.
Puma: Multi-Threaded Application Server
Puma uses a hybrid threading and process model. It spawns multiple worker processes, each running multiple threads. This architecture maximizes CPU utilization on multi-core systems while maintaining memory efficiency through copy-on-write forking.
# config/puma.rb
workers ENV.fetch("WEB_CONCURRENCY") { 4 }
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }
threads threads_count, threads_count
preload_app!
before_fork do
ActiveRecord::Base.connection_pool.disconnect! if defined?(ActiveRecord)
end
on_worker_boot do
ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end
on_worker_shutdown do
puts 'Worker shutting down gracefully'
end
# Handle SIGTERM gracefully for rolling deployments
on_restart do
puts 'Puma restarting...'
end
Puma's threading model requires thread-safe application code. Rails 4.0+ defaults to thread-safe operation, but legacy code or certain gems may not be thread-safe. The GIL limits true parallel Ruby code execution but doesn't prevent threaded I/O concurrency—threads can make concurrent database queries or HTTP requests.
The worker killer pattern addresses memory growth. Puma workers gradually accumulate memory through fragmentation and retained objects. The puma_worker_killer gem monitors worker memory and restarts workers exceeding thresholds:
# Gemfile
gem 'puma_worker_killer'
# config/puma.rb
before_fork do
require 'puma_worker_killer'
PumaWorkerKiller.config do |config|
config.ram = 1024 # MB
config.frequency = 10 # seconds
config.percent_usage = 0.90
end
PumaWorkerKiller.start
end
Unicorn: Process-Based Application Server
Unicorn uses a preforking model without threading. The master process loads the application, then forks worker processes. Each worker handles one request at a time. This model guarantees process isolation and simplifies application code—no thread safety concerns.
# config/unicorn.rb
worker_processes 4
timeout 30
preload_app true
listen "/var/run/unicorn.sock", backlog: 64
before_fork do |server, worker|
if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
end
old_pid = "#{server.config[:pid]}.oldbin"
if File.exist?(old_pid) && server.pid != old_pid
begin
Process.kill("QUIT", File.read(old_pid).to_i)
rescue Errno::ENOENT, Errno::ESRCH
end
end
end
after_fork do |server, worker|
if defined?(ActiveRecord::Base)
ActiveRecord::Base.establish_connection
end
end
Unicorn excels at zero-downtime deployments. The master process responds to SIGUSR2 by spawning a new master with the updated code. The new master starts new workers while old workers finish their requests. Once complete, the old master terminates gracefully.
Memory efficiency suffers compared to threaded servers. Four Unicorn workers consume roughly 4x the memory of one Puma worker with four threads. However, process isolation prevents thread-related issues and simplifies debugging.
Passenger: Integrated Web and Application Server
Passenger operates differently—it integrates directly with Nginx or Apache as a module. This eliminates the separate application server process, with Passenger spawning Ruby processes on demand.
# Nginx with Passenger
# /etc/nginx/nginx.conf
http {
passenger_root /usr/lib/ruby/vendor_ruby/phusion_passenger/locations.ini;
passenger_ruby /usr/bin/ruby;
server {
listen 80;
server_name example.com;
root /var/www/myapp/public;
passenger_enabled on;
passenger_min_instances 4;
passenger_max_pool_size 8;
}
}
Passenger monitors application processes and restarts them based on memory limits or request counts. It spawns processes on demand when traffic increases and terminates idle processes to conserve resources. This dynamic process management works well for applications with variable traffic patterns.
WEBrick: Development Server
WEBrick ships with Ruby as the default development server. It implements both web server and application server functionality in pure Ruby. Single-threaded and slow, WEBrick suffices for development but never for production.
# Automatically used by: rails server
# Or explicit usage:
require 'webrick'
require 'rack'
server = WEBrick::HTTPServer.new(Port: 3000)
server.mount '/', Rack::Handler::WEBrick, app
trap('INT') { server.shutdown }
server.start
Falcon: Fiber-Based Application Server
Falcon uses Ruby fibers for concurrency, implementing an event-driven model similar to Node.js. Fibers provide cooperative multitasking without thread safety concerns. This approach works well for I/O-bound applications.
# config.ru with Falcon
# Run with: falcon serve
require_relative 'app'
run App
Falcon requires Ruby 3.0+ for full fiber scheduler support. Applications must use async-aware libraries (async-http, async-postgres) to benefit from fiber concurrency. Traditional blocking I/O negates the concurrency benefits.
Implementation Approaches
Deploying Ruby applications with web servers and application servers involves choosing architectures appropriate for application requirements, team capabilities, and infrastructure constraints.
Development Environment
Development prioritizes fast iteration over production concerns. The simplest approach runs the application server directly without a web server. Rails applications use bin/rails server, starting Puma on port 3000. All requests flow through the application, including static assets.
Asset compilation complicates development. Rails asset pipeline compiles Sass, CoffeeScript, and other assets on demand in development mode. This slow process benefits from disabling in local development while using precompiled assets in production.
# config/environments/development.rb
Rails.application.configure do
config.assets.debug = true
config.assets.compile = true
config.assets.digest = false
# Serve static files from public directory
config.public_file_server.enabled = true
end
Single Server Deployment
Small applications with modest traffic requirements work well on a single server running both Nginx and Puma. This architecture provides production-quality request handling without multi-server complexity.
Systemd manages the application server process. The service file defines how to start, stop, and restart the application:
# /etc/systemd/system/myapp.service
[Unit]
Description=MyApp Puma Server
After=network.target
[Service]
Type=notify
User=deploy
Group=deploy
WorkingDirectory=/var/www/myapp
Environment=RAILS_ENV=production
Environment=PORT=3000
ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb
ExecReload=/bin/kill -SIGUSR1 $MAINPID
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Nginx configuration routes requests appropriately:
upstream puma {
server unix:///var/run/puma.sock;
}
server {
listen 80;
server_name example.com;
root /var/www/myapp/public;
try_files $uri/index.html $uri @puma;
location @puma {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://puma;
}
location ^~ /assets/ {
gzip_static on;
expires max;
add_header Cache-Control public;
}
}
Multi-Server Deployment
Applications exceeding single-server capacity require horizontal scaling. Multiple application servers run behind a load balancer, distributing requests and providing redundancy.
The load balancer becomes a critical component. Nginx can serve this role for modest scale, but dedicated load balancers (HAProxy, AWS ELB) provide advanced features like health checking, SSL termination, and sophisticated routing.
Session management requires attention in multi-server deployments. Cookie-based sessions work without modification. Server-side sessions need shared storage:
# config/initializers/session_store.rb
Rails.application.config.session_store :redis_store,
servers: ["redis://localhost:6379/0/session"],
expire_after: 90.minutes,
key: "_myapp_session",
secure: Rails.env.production?,
same_site: :lax
Database connection pooling requires careful configuration. Each application server process maintains its own connection pool. Total database connections equal (servers × workers × pool size). PostgreSQL connection limits must accommodate this:
# config/database.yml
production:
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
timeout: 5000
url: <%= ENV['DATABASE_URL'] %>
Container-Based Deployment
Docker containers package the application and its dependencies into portable units. Container orchestration systems (Kubernetes, ECS) manage deployment, scaling, and health checking.
A typical Dockerfile builds a production-ready image:
FROM ruby:3.2-slim
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev \
nodejs \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY Gemfile Gemfile.lock ./
RUN bundle config set --local deployment 'true' && \
bundle config set --local without 'development test' && \
bundle install -j4
COPY . .
RUN bundle exec rake assets:precompile
EXPOSE 3000
CMD ["bundle", "exec", "puma", "-C", "config/puma.rb"]
Kubernetes deployment manifests define how to run containers:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 4
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 3000
env:
- name: RAILS_ENV
value: production
- name: RAILS_MAX_THREADS
value: "5"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
Serverless Deployment
Serverless platforms (AWS Lambda, Google Cloud Functions) abstract away server management entirely. Ruby applications can deploy to these platforms with some architectural adjustments.
AWS Lambda with API Gateway provides an event-driven model. The application server runs only when handling requests, scaling automatically with load. Lambda cold starts introduce latency, making this approach better for sporadic workloads than consistent traffic.
Packaging Rails for Lambda requires specialized gems and adapters:
# Gemfile
gem 'lamby'
# config/application.rb
config.lamby = Lamby::Config.new
The application structure changes—controllers become Lambda function handlers, and routing happens through API Gateway configuration rather than Rails routes.
Performance Considerations
Performance characteristics differ significantly between web servers and application servers, with architectural choices creating cascading effects on throughput, latency, and resource utilization.
Concurrency Models and Throughput
Thread-based concurrency (Puma) allows more concurrent requests per process but requires thread-safe code. A Puma worker with 5 threads handles 5 concurrent requests in a single process. Process-based concurrency (Unicorn) requires separate processes for each concurrent request but guarantees isolation.
The choice depends on application characteristics. I/O-bound applications benefit from threaded concurrency—while one thread waits for database or API responses, other threads execute. CPU-bound applications gain little from threading due to the GIL, which prevents parallel Ruby code execution.
Benchmarking reveals these differences:
# Simulated I/O-bound endpoint
class ApiController < ApplicationController
def fetch_data
# Simulate external API call
sleep(0.1)
render json: { data: "response" }
end
end
Under load testing with 100 concurrent requests:
- Puma (4 workers, 5 threads): ~200 requests/second
- Unicorn (4 workers): ~40 requests/second
- Difference: Threaded concurrency handles I/O blocking without tying up workers
Connection Handling
Web servers optimize connection handling through event-driven architectures. Nginx with its event loop handles 10,000+ concurrent connections with minimal memory. Each connection consumes a file descriptor and small buffer, not an entire thread or process.
Application servers face different constraints. Each concurrent request requires execution context—a thread or process. Memory requirements scale with concurrent requests. A single Puma thread under load might consume 50-100MB of memory depending on application complexity.
Buffering slow clients demonstrates the value of web server/application server separation. Without a web server, the application process blocks writing response data to slow clients. With Nginx buffering, the application server completes the request quickly, and Nginx handles slow client draining:
# Nginx buffering configuration
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
Static Asset Serving
Web servers serve static files orders of magnitude faster than application servers. Nginx uses sendfile system calls, zero-copy file transmission directly from disk cache to network socket. This avoids copying data through userspace, reducing CPU usage and latency.
Application servers serving static files must read files into memory, construct HTTP responses, and write through the Rack interface. This overhead makes static asset serving through application servers wasteful:
# Benchmarking static file serving
# Nginx: ~50,000 requests/second
# Puma: ~2,000 requests/second
# Overhead: 25x slower
Asset compilation and fingerprinting in production enable aggressive caching:
# config/environments/production.rb
config.assets.compile = false
config.assets.digest = true
config.public_file_server.enabled = false
# Nginx serves from public/assets with 1-year cache
location ~* ^/assets/ {
expires 1y;
add_header Cache-Control "public, immutable";
}
Database Connection Pooling
Connection pooling overhead scales with worker count. Each application server process maintains a connection pool, consuming database resources. A deployment with 4 servers, 4 Puma workers per server, and pool size 5 requires 80 database connections.
Database connection limits become constraints. PostgreSQL defaults to 100 connections, quickly exhausted in multi-server deployments. PgBouncer provides connection pooling at the database level, multiplexing application connections over a smaller pool of actual database connections:
# config/database.yml with PgBouncer
production:
adapter: postgresql
url: <%= ENV['DATABASE_URL'] %>
pool: 5
# PgBouncer multiplexes these connections
Memory Management
Ruby's memory management affects application server performance. Garbage collection pauses impact request latency. Tuning GC parameters reduces pause frequency:
# config/boot.rb
# Tune Ruby GC for production
GC::Profiler.enable
# Increase heap slots to reduce GC frequency
ENV['RUBY_GC_HEAP_INIT_SLOTS'] ||= '600000'
ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] ||= '1.1'
ENV['RUBY_GC_HEAP_GROWTH_MAX_SLOTS'] ||= '100000'
Memory bloat over time requires worker restarts. Puma's phased restart replaces workers gradually:
# Trigger with: kill -USR1 <puma-pid>
# Workers restart one at a time, maintaining availability
Request Queuing
Request queues form at multiple layers. The web server queues requests waiting for available application server connections. The application server queues requests waiting for available threads or processes. Long queues indicate insufficient capacity.
Monitoring queue depth requires instrumentation:
# Monitor request queue time with rack-timeout
gem 'rack-timeout'
# config/initializers/rack_timeout.rb
Rack::Timeout.timeout = 15
Rack::Timeout.wait_timeout = 30
Rack::Timeout::Logger.level = Logger::ERROR
Queue time should remain under 100ms. Higher queue times indicate the need for additional application server instances or optimization of slow endpoints.
Tools & Ecosystem
The Ruby web server and application server ecosystem includes production-grade servers, development tools, monitoring solutions, and deployment utilities.
Production Web Servers
Nginx dominates Ruby production deployments. Its event-driven architecture, efficient static file serving, and flexible configuration make it the standard choice. Version 1.20+ includes HTTP/2 support, gRPC proxying, and dynamic modules.
Apache with mod_passenger provides an alternative. Apache's prefork and worker MPMs handle concurrency differently than Nginx, sometimes better for specific workloads. Configuration uses .htaccess files for directory-level rules.
Caddy represents a modern alternative with automatic HTTPS through Let's Encrypt, HTTP/3 support, and simpler configuration:
example.com {
root * /var/www/myapp/public
encode gzip
@notStatic {
not path /assets/*
not file {path}
}
reverse_proxy @notStatic localhost:3000
}
Application Server Options
Puma remains the default Rails application server. Active development, good performance, and threading support make it suitable for most applications. Configuration flexibility accommodates various deployment scenarios.
Unicorn continues serving applications prioritizing process isolation over threading. Battle-tested in production, it handles high-traffic applications reliably. Deployment simplicity and predictable behavior benefit operations teams.
Passenger (Enterprise) adds advanced features: rolling restarts, request/response buffering, detailed analytics, and resistance to memory bloat. Commercial support justifies the license cost for critical applications.
Falcon brings async/await patterns to Ruby with fiber-based concurrency. Applications must adopt async-compatible libraries, limiting adoption but offering performance benefits for I/O-intensive workloads.
Monitoring and Instrumentation
New Relic provides comprehensive application performance monitoring. It tracks request throughput, response times, database query performance, and error rates. Ruby agent installation requires minimal configuration:
# Gemfile
gem 'newrelic_rpm'
# config/newrelic.yml
production:
license_key: <%= ENV['NEW_RELIC_LICENSE_KEY'] %>
app_name: My Application
monitor_mode: true
developer_mode: false
Scout APM focuses on N+1 query detection and memory bloat. It identifies problematic endpoints and suggests optimizations. Integration mirrors New Relic with agent installation and configuration.
Skylight specializes in Rails applications with detailed timeline views showing time spent in views, database queries, and external services. The dashboard highlights endpoints consuming the most resources.
Process Management
Systemd manages application server processes on Linux systems. Service files define startup behavior, resource limits, and restart policies. Socket activation allows on-demand process spawning.
Foreman generates process management configurations from Procfiles. It creates systemd, upstart, or traditional init scripts:
# Procfile
web: bundle exec puma -C config/puma.rb
worker: bundle exec sidekiq -C config/sidekiq.yml
Capistrano automates deployment, handling code updates, asset compilation, and service restarts:
# Capfile
require 'capistrano/bundler'
require 'capistrano/rails'
require 'capistrano/puma'
# config/deploy.rb
set :application, 'myapp'
set :repo_url, 'git@github.com:user/myapp.git'
set :deploy_to, '/var/www/myapp'
set :puma_threads, [4, 16]
set :puma_workers, 4
Load Testing Tools
Apache Bench (ab) provides basic load testing. It measures throughput and latency under concurrent load but lacks advanced features:
ab -n 10000 -c 100 http://localhost:3000/
wrk offers scriptable load testing with Lua. It generates detailed latency distributions and supports complex request patterns:
wrk -t12 -c400 -d30s http://localhost:3000/
Siege simulates realistic user behavior with configurable think time between requests:
siege -c 50 -t 60s http://localhost:3000/
SSL/TLS Management
Let's Encrypt provides free SSL certificates with automated renewal. Certbot handles certificate issuance and renewal:
certbot certonly --webroot -w /var/www/myapp/public -d example.com
Nginx configuration enables HTTPS with proper security settings:
server {
listen 443 ssl http2;
server_name example.com;
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
}
Reference
Web Server Comparison
| Feature | Nginx | Apache | Caddy |
|---|---|---|---|
| Architecture | Event-driven | Prefork or Worker MPM | Event-driven |
| Configuration | nginx.conf with blocks | httpd.conf with .htaccess | Caddyfile |
| Static file serving | Excellent (sendfile) | Good | Good |
| Memory footprint | Very low | Moderate | Low |
| HTTP/2 support | Yes (1.20+) | Yes (2.4.17+) | Yes |
| HTTP/3 support | Experimental | No | Yes |
| Auto HTTPS | No | No | Yes (built-in) |
Application Server Comparison
| Server | Concurrency Model | Thread Safety Required | Memory Usage | Zero-Downtime Deploys |
|---|---|---|---|---|
| Puma | Multi-process + Multi-thread | Yes | Moderate | Yes (phased restart) |
| Unicorn | Multi-process | No | High | Yes (graceful restart) |
| Passenger | Multi-process or Multi-thread | Configurable | Moderate | Yes |
| Falcon | Fiber-based | No | Low | Partial |
| WEBrick | Single-threaded | N/A | Low | No |
Puma Configuration Reference
| Setting | Purpose | Typical Value |
|---|---|---|
| workers | Number of processes | 4 |
| threads min, max | Thread pool size per worker | 5, 5 |
| preload_app | Load app before forking | true |
| worker_timeout | Request timeout in seconds | 60 |
| worker_boot_timeout | Worker startup timeout | 30 |
| bind | Socket or port binding | unix:///var/run/puma.sock |
Unicorn Configuration Reference
| Setting | Purpose | Typical Value |
|---|---|---|
| worker_processes | Number of worker processes | 4 |
| timeout | Request timeout in seconds | 30 |
| listen | Socket or port binding | /var/run/unicorn.sock |
| preload_app | Load app before forking | true |
| stderr_path | Error log location | log/unicorn_stderr.log |
| stdout_path | Output log location | log/unicorn_stdout.log |
Nginx Proxy Configuration
| Directive | Purpose | Example |
|---|---|---|
| proxy_pass | Upstream server URL | http://puma |
| proxy_set_header | Set request header | X-Forwarded-For |
| proxy_buffering | Enable response buffering | on |
| proxy_buffer_size | Initial buffer size | 4k |
| proxy_buffers | Number and size of buffers | 8 4k |
| proxy_connect_timeout | Backend connection timeout | 60s |
| proxy_read_timeout | Backend read timeout | 60s |
Common Signals
| Signal | Puma Effect | Unicorn Effect |
|---|---|---|
| USR1 | Phased restart (rolling) | Reopen log files |
| USR2 | Restart all workers | Start new master process |
| TERM | Quick shutdown | Quick shutdown |
| QUIT | Graceful shutdown | Graceful shutdown |
| TTIN | Increase worker count | Increase worker count |
| TTOU | Decrease worker count | Decrease worker count |
Load Balancing Algorithms
| Algorithm | Nginx Directive | Behavior |
|---|---|---|
| Round Robin | (default) | Distributes requests sequentially |
| Least Connections | least_conn | Routes to server with fewest connections |
| IP Hash | ip_hash | Same client always routes to same server |
| Generic Hash | hash $variable | Custom hash-based routing |
| Random | random | Random server selection |
| Least Time | least_time | Routes to server with lowest response time |
Performance Tuning Checklist
| Component | Optimization | Impact |
|---|---|---|
| Nginx | Enable gzip compression | Reduced bandwidth usage |
| Nginx | Increase worker_connections | Higher concurrent connections |
| Nginx | Enable sendfile | Faster static file serving |
| Puma | Tune thread count | Better concurrency for I/O-bound apps |
| Puma | Tune worker count | Better CPU utilization |
| Application | Enable caching (Redis/Memcached) | Reduced database load |
| Application | Optimize database queries | Lower response times |
| Application | Precompile assets | Faster load times |
| Database | Connection pooling | Better connection reuse |
| Database | Query optimization | Reduced database CPU |