Overview
Rate limiting restricts the number of requests a client can make to a service within a specified time window. This mechanism prevents resource exhaustion, protects against abuse, and maintains service quality for all users. Without rate limiting, a single client could overwhelm a server with requests, degrading performance or causing complete service failure.
The concept originated from network traffic shaping and throttling in telecommunications, where bandwidth needed fair distribution among users. Modern web applications face similar challenges: API endpoints can receive thousands of requests per second, database queries can strain system resources, and malicious actors can attempt denial-of-service attacks. Rate limiting addresses these concerns by enforcing quotas on resource consumption.
Consider an API serving weather data. Without rate limiting, a single user could make millions of requests per hour, consuming bandwidth and compute resources while degrading service for others. With rate limiting in place, the API enforces a quota of 1,000 requests per hour per API key:
# First 1,000 requests within the hour
response = HTTParty.get("https://api.weather.com/data", headers: { "X-API-Key" => key })
# => 200 OK
# Request 1,001 within the same hour
response = HTTParty.get("https://api.weather.com/data", headers: { "X-API-Key" => key })
# => 429 Too Many Requests
# => Headers: { "X-RateLimit-Limit" => "1000", "X-RateLimit-Remaining" => "0" }
Rate limiting operates at various application layers. Web servers implement rate limiting for incoming HTTP requests, databases throttle query execution, message queues control consumption rates, and API gateways enforce access policies. Each layer protects specific resources using algorithms tailored to its requirements.
The effectiveness of rate limiting depends on accurate request identification. Systems track requests by IP address, API key, user account, OAuth token, or combinations of these identifiers. The choice affects both security and user experience: IP-based limiting can block legitimate users sharing an address, while account-based limiting provides granular control at the cost of requiring authentication.
Key Principles
Rate limiting systems operate on a fundamental principle: track request counts against time-based quotas. When a request arrives, the system increments a counter associated with the client identifier. If the counter exceeds the allowed limit within the time window, the system rejects subsequent requests until the window resets or capacity becomes available.
The core components of a rate limiting system include the identifier mechanism, the counting algorithm, the time window definition, and the action taken when limits are exceeded. The identifier uniquely represents a client or resource. The counting algorithm determines how requests accumulate and decay over time. The time window establishes the period over which limits apply. The action defines system behavior when quotas are exceeded.
Time windows use either fixed or sliding calculations. Fixed windows divide time into discrete intervals—hourly, daily, or other durations—and reset counters at interval boundaries. A fixed hourly window starting at 3:00 PM resets at 4:00 PM regardless of when requests occurred. Sliding windows calculate limits based on the exact time elapsed from any given moment, providing smoother rate distribution but requiring more complex tracking.
# Fixed window example
class FixedWindowLimiter
def initialize(limit, window_seconds)
@limit = limit
@window_seconds = window_seconds
@counts = {}
end
def allow?(key)
current_window = Time.now.to_i / @window_seconds
@counts[key] ||= {}
@counts[key][current_window] ||= 0
if @counts[key][current_window] < @limit
@counts[key][current_window] += 1
true
else
false
end
end
end
limiter = FixedWindowLimiter.new(5, 60)
limiter.allow?("user_123") # => true (request 1)
limiter.allow?("user_123") # => true (request 2)
# ... 3 more requests
limiter.allow?("user_123") # => false (limit exceeded)
Token bucket and leaky bucket algorithms represent alternative approaches. The token bucket algorithm maintains a bucket of tokens that replenishes at a constant rate. Each request consumes a token; when the bucket empties, requests are rejected or queued. This algorithm allows request bursts up to the bucket capacity while maintaining an average rate over time.
The leaky bucket algorithm enforces a strict output rate regardless of input rate. Requests enter a queue that drains at a constant rate. When the queue fills, new requests are rejected. This approach provides predictable output rates but can introduce latency since requests wait in the queue.
Distributed systems require coordination when implementing rate limiting. A single server can maintain counters in memory, but distributed architectures need shared state across instances. Centralized data stores like Redis provide atomic operations for counter management, enabling consistent rate limiting across server clusters. The trade-off involves latency introduced by network calls to the centralized store.
Rate limit responses must communicate limit details to clients. HTTP applications use status code 429 (Too Many Requests) along with headers indicating limit information:
X-RateLimit-Limit: 1000 # Total requests allowed
X-RateLimit-Remaining: 247 # Requests remaining in window
X-RateLimit-Reset: 1698364800 # Unix timestamp when limit resets
Retry-After: 3600 # Seconds until retry is allowed
These headers enable clients to implement intelligent retry logic and avoid unnecessary requests. Applications can display limit information to users or adjust request patterns based on remaining quota.
Implementation Approaches
Fixed window counters provide the simplest rate limiting implementation. The algorithm divides time into fixed intervals and counts requests within each interval. When the interval ends, the counter resets to zero. This approach requires minimal memory—one counter per client per window—and simple logic.
class FixedWindow
def initialize(redis, limit, window_seconds)
@redis = redis
@limit = limit
@window = window_seconds
end
def allow?(identifier)
key = "rate_limit:#{identifier}:#{current_window}"
count = @redis.incr(key)
@redis.expire(key, @window * 2) if count == 1
count <= @limit
end
private
def current_window
Time.now.to_i / @window
end
end
Fixed windows suffer from boundary issues. A client can make the maximum number of requests at the end of one window and again at the start of the next, effectively doubling the rate for a brief period. Consider a limit of 100 requests per minute: a client making 100 requests at 12:00:59 and another 100 at 12:01:00 achieves 200 requests in two seconds, despite the per-minute limit.
Sliding window counters address boundary issues by calculating limits based on the exact time elapsed from the current moment. Instead of fixed intervals, the algorithm examines the request count over the previous N seconds relative to each request. This provides smoother rate enforcement but requires tracking individual request timestamps.
class SlidingWindow
def initialize(redis, limit, window_seconds)
@redis = redis
@limit = limit
@window = window_seconds
end
def allow?(identifier)
key = "rate_limit:#{identifier}"
now = Time.now.to_f
window_start = now - @window
@redis.multi do |transaction|
transaction.zremrangebyscore(key, "-inf", window_start)
transaction.zadd(key, now, "#{now}:#{rand}")
transaction.zcount(key, window_start, "+inf")
transaction.expire(key, @window * 2)
end
_, _, count, _ = @redis.exec
count <= @limit
end
end
The sliding window log approach stores timestamps of each request in a sorted set. For each new request, the algorithm removes expired timestamps, adds the current timestamp, and checks if the total count exceeds the limit. This provides precise rate limiting but consumes memory proportional to the limit and can be expensive for high-traffic scenarios.
Sliding window counters combine fixed window efficiency with sliding window precision. The algorithm maintains counters for the current and previous fixed windows, then calculates an approximation based on the percentage of the current window elapsed:
class SlidingWindowCounter
def initialize(redis, limit, window_seconds)
@redis = redis
@limit = limit
@window = window_seconds
end
def allow?(identifier)
now = Time.now.to_i
current_window = now / @window
previous_window = current_window - 1
elapsed_in_current = now % @window
weight = (@window - elapsed_in_current).to_f / @window
current_key = "rate_limit:#{identifier}:#{current_window}"
previous_key = "rate_limit:#{identifier}:#{previous_window}"
current_count = @redis.get(current_key).to_i
previous_count = @redis.get(previous_key).to_i
weighted_count = (previous_count * weight) + current_count
if weighted_count < @limit
@redis.multi do |transaction|
transaction.incr(current_key)
transaction.expire(current_key, @window * 2)
end
true
else
false
end
end
end
This hybrid approach reduces memory requirements while avoiding fixed window boundary problems. The weighted calculation approximates a true sliding window by considering how much of the previous window overlaps with the current observation period.
Token bucket algorithms model rate limiting as tokens in a bucket that refills at a constant rate. Each request consumes one or more tokens. The bucket has a maximum capacity, allowing request bursts up to that capacity. When tokens are exhausted, requests are rejected until the bucket refills.
class TokenBucket
def initialize(redis, capacity, refill_rate)
@redis = redis
@capacity = capacity
@refill_rate = refill_rate # tokens per second
end
def allow?(identifier, tokens = 1)
key = "token_bucket:#{identifier}"
now = Time.now.to_f
script = <<~LUA
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local tokens_requested = tonumber(ARGV[4])
local bucket = redis.call('hmget', KEYS[1], 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
local elapsed = now - last_refill
local refilled = elapsed * refill_rate
tokens = math.min(capacity, tokens + refilled)
if tokens >= tokens_requested then
tokens = tokens - tokens_requested
redis.call('hmset', KEYS[1], 'tokens', tokens, 'last_refill', now)
redis.call('expire', KEYS[1], 3600)
return 1
else
return 0
end
LUA
result = @redis.eval(script, [key], [@capacity, @refill_rate, now, tokens])
result == 1
end
end
Token buckets handle variable request costs by consuming multiple tokens for expensive operations. A search query might consume 5 tokens while a simple read consumes 1 token. This provides granular control over resource consumption while maintaining the burst-handling benefits of token buckets.
Leaky bucket algorithms enforce a constant output rate by queuing requests that arrive faster than the drain rate. The bucket has a fixed capacity; requests fill the bucket and drain at a constant rate. When the bucket overflows, requests are rejected.
class LeakyBucket
def initialize(redis, capacity, drain_rate)
@redis = redis
@capacity = capacity
@drain_rate = drain_rate # requests per second
end
def allow?(identifier)
key = "leaky_bucket:#{identifier}"
now = Time.now.to_f
script = <<~LUA
local capacity = tonumber(ARGV[1])
local drain_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local bucket = redis.call('hmget', KEYS[1], 'level', 'last_drain')
local level = tonumber(bucket[1]) or 0
local last_drain = tonumber(bucket[2]) or now
local elapsed = now - last_drain
local drained = elapsed * drain_rate
level = math.max(0, level - drained)
if level < capacity then
level = level + 1
redis.call('hmset', KEYS[1], 'level', level, 'last_drain', now)
redis.call('expire', KEYS[1], 3600)
return 1
else
return 0
end
LUA
result = @redis.eval(script, [key], [@capacity, @drain_rate, now])
result == 1
end
end
Leaky buckets provide predictable request rates, making them suitable for protecting downstream systems with strict throughput requirements. The trade-off involves potential request latency since the algorithm enforces a maximum processing rate regardless of available capacity.
Distributed rate limiting requires consensus across multiple server instances. Centralized stores like Redis or Memcached provide atomic operations for counter management. Each server queries the shared store to check and update counters. Race conditions are prevented through atomic increment operations or Lua scripts that execute multiple commands atomically.
Ruby Implementation
Ruby applications typically implement rate limiting through Rack middleware, making it applicable to any Rack-based framework including Rails, Sinatra, and Grape. The Rack::Attack gem provides a mature, flexible rate limiting solution with multiple strategies and storage backends.
# config/initializers/rack_attack.rb
class Rack::Attack
# Throttle general requests by IP address
throttle('req/ip', limit: 300, period: 5.minutes) do |req|
req.ip unless req.path.start_with?('/assets')
end
# Throttle API requests by API key
throttle('api/key', limit: 1000, period: 1.hour) do |req|
req.env['HTTP_X_API_KEY'] if req.path.start_with?('/api')
end
# Throttle login attempts by email
throttle('logins/email', limit: 5, period: 20.minutes) do |req|
if req.path == '/login' && req.post?
req.params['email'].to_s.downcase.presence
end
end
# Different limits for authenticated users
throttle('authenticated/user', limit: 10000, period: 1.hour) do |req|
if req.env['warden'].authenticate?
req.env['warden'].user.id
end
end
end
Rack::Attack integrates with Rails cache backends, using Rails.cache by default. For production systems, Redis provides the performance and atomicity required for accurate rate limiting:
# config/initializers/rack_attack.rb
Rack::Attack.cache.store = ActiveSupport::Cache::RedisCacheStore.new(
url: ENV['REDIS_URL'],
namespace: 'rack_attack'
)
Custom throttle responses provide clients with clear feedback about rate limits:
class Rack::Attack
throttled_responder = lambda do |request|
match_data = request.env['rack.attack.match_data']
now = match_data[:epoch_time]
headers = {
'X-RateLimit-Limit' => match_data[:limit].to_s,
'X-RateLimit-Remaining' => '0',
'X-RateLimit-Reset' => (now + (match_data[:period] - now % match_data[:period])).to_s,
'Content-Type' => 'application/json'
}
[429, headers, [{ error: 'Rate limit exceeded' }.to_json]]
end
self.throttled_responder = throttled_responder
end
Implementing rate limiting without external dependencies involves creating custom Rack middleware:
class RateLimitMiddleware
def initialize(app, options = {})
@app = app
@limit = options[:limit] || 100
@period = options[:period] || 3600
@store = options[:store] || {}
end
def call(env)
request = Rack::Request.new(env)
identifier = get_identifier(request)
if rate_limit_exceeded?(identifier)
return rate_limit_response
end
increment_counter(identifier)
@app.call(env)
end
private
def get_identifier(request)
request.env['HTTP_X_API_KEY'] || request.ip
end
def rate_limit_exceeded?(identifier)
current_window = Time.now.to_i / @period
key = "#{identifier}:#{current_window}"
(@store[key] || 0) >= @limit
end
def increment_counter(identifier)
current_window = Time.now.to_i / @period
key = "#{identifier}:#{current_window}"
@store[key] ||= 0
@store[key] += 1
end
def rate_limit_response
[
429,
{ 'Content-Type' => 'application/json' },
[{ error: 'Rate limit exceeded' }.to_json]
]
end
end
Rails applications can implement rate limiting at the controller level using concerns:
module RateLimited
extend ActiveSupport::Concern
included do
before_action :check_rate_limit
end
private
def check_rate_limit
limiter = RateLimiter.new(
key: rate_limit_key,
limit: rate_limit_count,
period: rate_limit_period
)
unless limiter.allow?
response.headers['X-RateLimit-Limit'] = rate_limit_count.to_s
response.headers['X-RateLimit-Remaining'] = '0'
response.headers['X-RateLimit-Reset'] = limiter.reset_time.to_s
render json: { error: 'Rate limit exceeded' }, status: :too_many_requests
end
end
def rate_limit_key
"rate_limit:#{controller_name}:#{action_name}:#{current_user&.id || request.ip}"
end
def rate_limit_count
100
end
def rate_limit_period
3600
end
end
class ApiController < ApplicationController
include RateLimited
def rate_limit_count
current_user&.premium? ? 10000 : 1000
end
end
Background job processing requires rate limiting to prevent overwhelming external APIs or databases:
class RateLimitedJob < ApplicationJob
queue_as :default
def perform(user_id, action)
limiter = TokenBucketLimiter.new(
key: "api_calls:#{user_id}",
capacity: 100,
refill_rate: 10 # 10 tokens per second
)
unless limiter.consume(tokens: 1)
# Reschedule job for later
self.class.set(wait: limiter.time_until_tokens(1)).perform_later(user_id, action)
return
end
# Perform API call
ExternalService.call(user_id, action)
end
end
Redis-backed rate limiting with Lua scripts ensures atomic operations and reduces network round trips:
class RedisRateLimiter
LUA_SCRIPT = <<~LUA
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current_time = tonumber(ARGV[3])
local window_start = current_time - window
redis.call('zremrangebyscore', key, '-inf', window_start)
local current_count = redis.call('zcard', key)
if current_count < limit then
redis.call('zadd', key, current_time, current_time)
redis.call('expire', key, window * 2)
return {1, limit - current_count - 1}
else
return {0, 0}
end
LUA
def initialize(redis)
@redis = redis
@script_sha = @redis.script(:load, LUA_SCRIPT)
end
def allow?(key, limit, window)
current_time = Time.now.to_f
allowed, remaining = @redis.evalsha(
@script_sha,
[key],
[limit, window, current_time]
)
[allowed == 1, remaining]
end
end
Testing rate limiting requires simulating time progression and request sequences:
RSpec.describe RateLimiter do
let(:redis) { Redis.new }
let(:limiter) { described_class.new(redis, limit: 5, period: 60) }
before { redis.flushdb }
it 'allows requests within limit' do
5.times do
expect(limiter.allow?('user_1')).to be true
end
end
it 'blocks requests exceeding limit' do
5.times { limiter.allow?('user_1') }
expect(limiter.allow?('user_1')).to be false
end
it 'resets after time window' do
5.times { limiter.allow?('user_1') }
Timecop.travel(61.seconds.from_now) do
expect(limiter.allow?('user_1')).to be true
end
end
it 'tracks different users separately' do
5.times { limiter.allow?('user_1') }
expect(limiter.allow?('user_2')).to be true
end
end
Common Patterns
Per-user rate limiting provides individualized quotas based on user accounts. This approach prevents a single user from consuming excessive resources while allowing legitimate high-volume users appropriate access. The pattern requires authentication and associates limits with user identifiers rather than IP addresses.
class Rack::Attack
throttle('api/user', limit: 1000, period: 1.hour) do |req|
if req.path.start_with?('/api') && req.env['warden'].user
req.env['warden'].user.id
end
end
# Unauthenticated requests have lower limits
throttle('api/ip', limit: 100, period: 1.hour) do |req|
req.ip if req.path.start_with?('/api') && !req.env['warden'].user
end
end
Tiered rate limiting assigns different limits based on user subscription levels or account types. Premium users receive higher quotas than free users, aligning resource allocation with business models:
class TieredRateLimiter
LIMITS = {
free: { limit: 100, period: 3600 },
basic: { limit: 1000, period: 3600 },
premium: { limit: 10000, period: 3600 },
enterprise: { limit: 100000, period: 3600 }
}
def initialize(redis)
@redis = redis
end
def allow?(user)
tier_config = LIMITS[user.subscription_tier]
key = "rate_limit:#{user.id}"
RateLimiter.new(@redis, **tier_config).allow?(key)
end
end
Dynamic rate limiting adjusts limits based on system load or user behavior. During high traffic periods, the system reduces limits to maintain stability. For users with established good behavior, limits gradually increase:
class DynamicRateLimiter
def initialize(redis, base_limit:, base_period:)
@redis = redis
@base_limit = base_limit
@base_period = base_period
end
def allow?(identifier)
multiplier = calculate_multiplier(identifier)
effective_limit = (@base_limit * multiplier).to_i
RateLimiter.new(@redis, limit: effective_limit, period: @base_period)
.allow?(identifier)
end
private
def calculate_multiplier(identifier)
# Check system load
cpu_usage = SystemMetrics.cpu_usage
load_factor = case cpu_usage
when 0..50 then 1.5
when 51..75 then 1.0
when 76..90 then 0.5
else 0.25
end
# Check user reputation
reputation_key = "reputation:#{identifier}"
reputation_score = @redis.get(reputation_key).to_f
reputation_factor = [0.5, reputation_score / 100.0, 2.0].sort[1]
load_factor * reputation_factor
end
end
Endpoint-specific rate limiting applies different limits to various API endpoints based on resource cost. Expensive operations like search or report generation have stricter limits than simple read operations:
class Rack::Attack
# Strict limit for expensive search endpoint
throttle('api/search', limit: 10, period: 1.minute) do |req|
if req.path == '/api/search'
req.env['HTTP_X_API_KEY'] || req.ip
end
end
# Moderate limit for write operations
throttle('api/write', limit: 100, period: 1.hour) do |req|
if req.post? || req.put? || req.patch?
req.env['HTTP_X_API_KEY'] || req.ip
end
end
# Higher limit for read operations
throttle('api/read', limit: 1000, period: 1.hour) do |req|
if req.get?
req.env['HTTP_X_API_KEY'] || req.ip
end
end
end
Distributed rate limiting coordinates limits across multiple application servers using a shared data store. This prevents each server from applying limits independently, which would effectively multiply the total allowed requests:
class DistributedRateLimiter
def initialize
@redis = Redis.new(
url: ENV['REDIS_URL'],
timeout: 1,
reconnect_attempts: 3
)
end
def allow?(key, limit, period)
script = <<~LUA
local current = redis.call('incr', KEYS[1])
if current == 1 then
redis.call('expire', KEYS[1], ARGV[1])
end
return current
LUA
window_key = "#{key}:#{Time.now.to_i / period}"
current = @redis.eval(script, [window_key], [period])
current <= limit
rescue Redis::BaseError => e
# Fail open on Redis errors to maintain availability
Rails.logger.error("Rate limiter error: #{e.message}")
true
end
end
Graceful degradation handles rate limit failures by defaulting to permissive behavior when the rate limiting system becomes unavailable. This maintains application availability at the cost of temporarily unlimited access:
class ResilientRateLimiter
def initialize(redis, fallback: :allow)
@redis = redis
@fallback = fallback
@circuit_breaker = CircuitBreaker.new(threshold: 5, timeout: 30)
end
def allow?(key, limit, period)
return fallback_allow? unless @circuit_breaker.allow_request?
result = @redis.incr("#{key}:#{Time.now.to_i / period}")
@circuit_breaker.record_success
result <= limit
rescue Redis::BaseError => e
@circuit_breaker.record_failure
Rails.logger.error("Rate limiter Redis error: #{e.message}")
fallback_allow?
end
private
def fallback_allow?
@fallback == :allow
end
end
Cost-based rate limiting assigns different costs to operations based on resource consumption. A single request might consume multiple quota units:
class CostBasedRateLimiter
OPERATION_COSTS = {
'GET /api/users/:id' => 1,
'GET /api/search' => 5,
'POST /api/reports' => 10,
'POST /api/batch_import' => 50
}
def initialize(token_bucket)
@token_bucket = token_bucket
end
def allow?(user, operation)
cost = calculate_cost(operation)
@token_bucket.consume(user.id, tokens: cost)
end
private
def calculate_cost(operation)
OPERATION_COSTS[operation] || 1
end
end
Security Implications
Rate limiting serves as a primary defense against denial-of-service attacks. Without rate limits, attackers can exhaust server resources, database connections, or network bandwidth by flooding the system with requests. Effective rate limiting blocks these attacks by restricting request volumes from any single source.
Distributed denial-of-service (DDoS) attacks pose a greater challenge since requests originate from many IP addresses simultaneously. Pure IP-based rate limiting becomes less effective as attackers distribute load across compromised machines. Defense requires multiple layers: network-level rate limiting at load balancers or CDNs, application-level rate limiting for authenticated endpoints, and behavioral analysis to identify coordinated attack patterns.
class Rack::Attack
# Block requests from known bad actors
blocklist('block_bad_actors') do |req|
BadActorRegistry.blocked?(req.ip)
end
# Aggressive rate limiting for suspicious patterns
throttle('suspicious/ip', limit: 10, period: 1.minute) do |req|
req.ip if suspicious_request?(req)
end
# Normal rate limiting for regular traffic
throttle('req/ip', limit: 300, period: 5.minutes) do |req|
req.ip
end
def self.suspicious_request?(req)
# Detect patterns like rapid endpoint scanning
ua = req.user_agent
ua.nil? || ua.empty? || ua.match?(/bot|crawler|spider/i)
end
end
Authentication bypass attacks attempt to circumvent rate limits by generating new identifiers. IP-based limiting can be evaded using proxy networks or VPNs. API key rotation evades key-based limiting. Defense requires tracking multiple identifier types and applying the strictest limit that applies:
class CompositeRateLimiter
def initialize(redis)
@redis = redis
end
def allow?(request)
identifiers = [
request.ip,
request.env['HTTP_X_API_KEY'],
request.env['warden']&.user&.id
].compact
# Apply rate limit to each identifier
# Fail if ANY limit is exceeded
identifiers.all? do |identifier|
RateLimiter.new(@redis, limit: 1000, period: 3600).allow?(identifier)
end
end
end
Credential stuffing attacks use stolen username-password pairs to gain unauthorized access. Rate limiting login endpoints prevents attackers from testing large numbers of credentials:
class Rack::Attack
# Strict rate limit for login attempts per IP
throttle('logins/ip', limit: 5, period: 5.minutes) do |req|
if req.path == '/login' && req.post?
req.ip
end
end
# Strict rate limit for login attempts per email
throttle('logins/email', limit: 5, period: 15.minutes) do |req|
if req.path == '/login' && req.post?
req.params['email'].to_s.downcase.presence
end
end
# Progressive backoff after failed attempts
throttle('failed_logins/email', limit: 10, period: 1.hour) do |req|
if req.path == '/login' && req.post?
email = req.params['email'].to_s.downcase
key = "failed_logins:#{email}"
# Track in after_action callback
req.env['rack.attack.failed_login_email'] = email
email if Redis.current.get(key).to_i > 3
end
end
end
# Track failed login attempts
Rails.application.config.after_initialize do
ActiveSupport::Notifications.subscribe('process_action.action_controller') do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
request = event.payload[:request]
if email = request.env['rack.attack.failed_login_email']
if event.payload[:status] == 401
key = "failed_logins:#{email}"
Redis.current.incr(key)
Redis.current.expire(key, 3600)
end
end
end
end
Enumeration attacks attempt to discover valid accounts, API endpoints, or resources by testing many possibilities. Rate limiting prevents rapid enumeration while allowing legitimate discovery:
class Rack::Attack
# Limit requests that might be enumeration attempts
throttle('enumeration/404s', limit: 50, period: 10.minutes) do |req|
key = "enumeration:#{req.ip}"
# Track in after_action callback
req.env['rack.attack.enumeration_key'] = key
req.ip
end
end
# Track 404 responses as potential enumeration
Rails.application.config.after_initialize do
ActiveSupport::Notifications.subscribe('process_action.action_controller') do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
request = event.payload[:request]
if key = request.env['rack.attack.enumeration_key']
if event.payload[:status] == 404
count = Redis.current.incr(key)
Redis.current.expire(key, 600)
# Aggressive limiting after many 404s
if count > 20
Rack::Attack::Allow2Ban.filter(request.ip, maxretry: 5, findtime: 600, bantime: 3600) do
count > 20
end
end
end
end
end
end
API key leakage occurs when keys are exposed in public repositories, client-side code, or intercepted traffic. Rate limiting per key prevents catastrophic abuse of leaked keys:
class ApiKeyRateLimiter
def initialize(redis)
@redis = redis
end
def allow?(api_key)
# Normal rate limit
normal_limit = RateLimiter.new(@redis, limit: 10000, period: 3600)
return false unless normal_limit.allow?("api_key:#{api_key}")
# Additional check for suspicious activity
if suspicious_activity?(api_key)
alert_security_team(api_key)
return false
end
true
end
private
def suspicious_activity?(api_key)
# Check for rapid requests from many IPs
key = "api_key_ips:#{api_key}"
ip_count = @redis.zcard(key)
# More than 100 unique IPs in an hour is suspicious
ip_count > 100
end
def alert_security_team(api_key)
SecurityAlert.create!(
type: 'suspicious_api_key_usage',
api_key_id: api_key,
message: 'API key showing suspicious multi-IP usage'
)
end
end
Cache poisoning attacks attempt to pollute caches with malicious content by making requests that bypass origin rate limits but hit cached responses repeatedly. Rate limiting must occur before cache layers:
# Rate limit before serving from cache
class RateLimitBeforeCache
def initialize(app)
@app = app
@rate_limiter = RateLimiter.new
end
def call(env)
request = Rack::Request.new(env)
# Rate limit check happens first
unless @rate_limiter.allow?(request.ip)
return [429, {}, ['Rate limit exceeded']]
end
# Cache middleware runs after rate limiting
@app.call(env)
end
end
# Middleware order matters
Rails.application.config.middleware.insert_before(
Rack::Cache,
RateLimitBeforeCache
)
Tools & Ecosystem
Rack::Attack provides the most widely-used rate limiting solution for Ruby web applications. The gem integrates with Rack middleware, supporting Rails, Sinatra, Grape, and other Rack-based frameworks. It offers flexible configuration, multiple throttling strategies, and integration with various cache backends:
# Gemfile
gem 'rack-attack'
# config/application.rb
config.middleware.use Rack::Attack
# config/initializers/rack_attack.rb
Rack::Attack.cache.store = ActiveSupport::Cache::RedisCacheStore.new(
url: ENV['REDIS_URL']
)
class Rack::Attack
safelist('allow_localhost') do |req|
req.ip == '127.0.0.1' || req.ip == '::1'
end
blocklist('block_bad_ip') do |req|
BlockedIpList.include?(req.ip)
end
throttle('req/ip', limit: 300, period: 5.minutes) do |req|
req.ip
end
# Exponential backoff for repeated violations
Rack::Attack.blocklist('penalized_ips') do |req|
Rack::Attack::Allow2Ban.filter(req.ip, maxretry: 5, findtime: 10.minutes, bantime: 1.hour) do
Rack::Attack.cache.count("violations:#{req.ip}", 10.minutes) > 5
end
end
end
Redis provides the backend storage for distributed rate limiting systems. Its atomic operations and built-in expiration make it suitable for high-performance rate limiting:
# Gemfile
gem 'redis'
gem 'hiredis' # Optional C extension for better performance
# config/initializers/redis.rb
Redis.current = Redis.new(
url: ENV['REDIS_URL'],
driver: :hiredis,
timeout: 1,
reconnect_attempts: 3
)
# Connection pooling for threaded servers
Redis.current = ConnectionPool.new(size: 25, timeout: 5) do
Redis.new(url: ENV['REDIS_URL'], driver: :hiredis)
end
The redis-throttle gem provides a lightweight rate limiting implementation focused specifically on throttling:
# Gemfile
gem 'redis-throttle'
require 'redis/throttle'
redis = Redis.new
throttle = Redis::Throttle.new(key: 'user:123', limits: [[10, 60], [100, 3600]], redis: redis)
if throttle.allowed?
# Process request
else
# Rate limit exceeded
end
Sidekiq Enterprise includes rate limiting for background jobs, preventing overwhelming external APIs or databases:
# Gemfile
gem 'sidekiq-enterprise'
class ApiCallJob
include Sidekiq::Job
sidekiq_options queue: 'api_calls',
limiter: {
name: 'external_api',
limit: 100,
period: 60
}
def perform(user_id, action)
ExternalApi.call(user_id, action)
end
end
The rate_limiter gem provides a simple, flexible Ruby implementation without framework dependencies:
# Gemfile
gem 'rate_limiter'
require 'rate_limiter'
limiter = RateLimiter.new(
interval: 3600,
max: 1000,
store: Redis.new
)
if limiter.exceeded?('user:123')
# Rate limit exceeded
else
limiter.increment('user:123')
# Process request
end
Cloudflare and other CDN providers offer edge-based rate limiting that blocks requests before they reach application servers. This provides DDoS protection and reduces server load:
# Configure via Cloudflare API
require 'cloudflare'
cf = Cloudflare.new(
email: ENV['CLOUDFLARE_EMAIL'],
key: ENV['CLOUDFLARE_KEY']
)
zone = cf.zones.find_by_name('example.com')
# Create rate limiting rule
zone.firewall.rate_limits.create(
threshold: 1000,
period: 60,
action: {
mode: 'challenge', # or 'block', 'simulate'
timeout: 86400
},
match: {
request: {
methods: ['GET', 'POST'],
url: 'example.com/api/*'
}
}
)
Kong API Gateway provides centralized rate limiting for microservices architectures:
# Enable rate limiting plugin
curl -X POST http://localhost:8001/plugins \
--data "name=rate-limiting" \
--data "config.minute=100" \
--data "config.hour=1000" \
--data "config.policy=redis" \
--data "config.redis_host=redis.example.com"
The throttle gem offers a simple Ruby implementation for rate limiting with multiple backends:
# Gemfile
gem 'throttle'
throttle = Throttle.new(
key: "api:#{user.id}",
max: 100,
period: 3600,
store: Throttle::Store::Redis.new(Redis.current)
)
if throttle.allow?
# Process request
throttle.record!
else
# Rate limit exceeded
end
Reference
Rate Limiting Algorithms
| Algorithm | Time Complexity | Space Complexity | Use Case |
|---|---|---|---|
| Fixed Window | O(1) | O(n) per window | Simple quotas, low traffic |
| Sliding Window Log | O(log n) | O(m) where m is limit | Precise rate limiting, audit trails |
| Sliding Window Counter | O(1) | O(n) per window | Balance of precision and efficiency |
| Token Bucket | O(1) | O(1) per user | Burst handling, variable costs |
| Leaky Bucket | O(1) | O(1) per user | Smooth output rates, queue-based |
HTTP Response Headers
| Header | Description | Example |
|---|---|---|
| X-RateLimit-Limit | Total requests allowed in window | 1000 |
| X-RateLimit-Remaining | Requests remaining in current window | 247 |
| X-RateLimit-Reset | Unix timestamp when limit resets | 1698364800 |
| Retry-After | Seconds until retry is allowed | 3600 |
| X-RateLimit-Used | Number of requests consumed | 753 |
HTTP Status Codes
| Code | Name | When Used |
|---|---|---|
| 429 | Too Many Requests | Rate limit exceeded |
| 503 | Service Unavailable | System overload, temporary blocking |
| 200 | OK | Request allowed, within limits |
| 403 | Forbidden | Permanent block, security violation |
Rack::Attack Throttle Options
| Option | Type | Description |
|---|---|---|
| limit | Integer | Maximum requests allowed in period |
| period | Integer | Time window in seconds |
| name | String | Unique identifier for throttle rule |
| discriminator | Proc | Block returning identifier for grouping requests |
Redis Commands for Rate Limiting
| Command | Purpose | Example |
|---|---|---|
| INCR | Increment counter atomically | INCR rate_limit:user:123 |
| EXPIRE | Set key expiration | EXPIRE rate_limit:user:123 3600 |
| TTL | Get remaining expiration time | TTL rate_limit:user:123 |
| ZADD | Add to sorted set with score | ZADD timestamps user123 1698364800 |
| ZREMRANGEBYSCORE | Remove entries by score range | ZREMRANGEBYSCORE timestamps -inf 1698360000 |
| ZCARD | Count sorted set members | ZCARD timestamps |
| EVAL | Execute Lua script atomically | EVAL script 1 key arg1 arg2 |
Common Rate Limit Configurations
| Scenario | Limit | Period | Algorithm |
|---|---|---|---|
| Public API (free tier) | 100 | 1 hour | Fixed window |
| Public API (paid tier) | 10000 | 1 hour | Token bucket |
| Login attempts per IP | 5 | 15 minutes | Sliding window |
| Login attempts per email | 5 | 1 hour | Sliding window |
| Search queries | 10 | 1 minute | Token bucket |
| File uploads | 10 | 1 hour | Leaky bucket |
| Admin actions | 100 | 1 hour | Fixed window |
| Webhook deliveries | 1000 | 1 hour | Token bucket |
Rate Limiter Configuration Checklist
| Item | Consideration |
|---|---|
| Identifier | IP address, API key, user ID, session ID |
| Storage | Redis, Memcached, in-memory, database |
| Algorithm | Fixed window, sliding window, token bucket, leaky bucket |
| Limits | Per-second, per-minute, per-hour, per-day |
| Tiers | Free, basic, premium, enterprise |
| Scope | Global, per-endpoint, per-action, per-resource |
| Response | Block, throttle, queue, error message |
| Headers | Include rate limit information for clients |
| Monitoring | Track limit hits, violations, trends |
| Failover | Behavior when storage unavailable |
Security Considerations Checklist
| Consideration | Implementation |
|---|---|
| DDoS protection | Multiple layer rate limiting |
| Credential stuffing | Strict login rate limits per IP and email |
| Enumeration attacks | Limit 404 responses, random delays |
| API key leakage | Per-key limits, multi-IP detection |
| Cache poisoning | Rate limit before cache layer |
| Bypass attempts | Composite identifiers (IP + key + user) |
| Distributed attacks | CDN/edge rate limiting |
| Failed attempts | Progressive backoff, temporary blocks |