CrackedRuby - Rate Limiting

Overview

Rate limiting restricts the number of requests a client can make to a service within a specified time window. This mechanism prevents resource exhaustion, protects against abuse, and maintains service quality for all users. Without rate limiting, a single client could overwhelm a server with requests, degrading performance or causing complete service failure.

The concept originated from network traffic shaping and throttling in telecommunications, where bandwidth needed fair distribution among users. Modern web applications face similar challenges: API endpoints can receive thousands of requests per second, database queries can strain system resources, and malicious actors can attempt denial-of-service attacks. Rate limiting addresses these concerns by enforcing quotas on resource consumption.

Consider an API serving weather data. Without rate limiting, a single user could make millions of requests per hour, consuming bandwidth and compute resources while degrading service for others. With rate limiting in place, the API enforces a quota of 1,000 requests per hour per API key:

# First 1,000 requests within the hour
response = HTTParty.get("https://api.weather.com/data", headers: { "X-API-Key" => key })
# => 200 OK

# Request 1,001 within the same hour
response = HTTParty.get("https://api.weather.com/data", headers: { "X-API-Key" => key })
# => 429 Too Many Requests
# => Headers: { "X-RateLimit-Limit" => "1000", "X-RateLimit-Remaining" => "0" }

Rate limiting operates at various application layers. Web servers implement rate limiting for incoming HTTP requests, databases throttle query execution, message queues control consumption rates, and API gateways enforce access policies. Each layer protects specific resources using algorithms tailored to its requirements.

The effectiveness of rate limiting depends on accurate request identification. Systems track requests by IP address, API key, user account, OAuth token, or combinations of these identifiers. The choice affects both security and user experience: IP-based limiting can block legitimate users sharing an address, while account-based limiting provides granular control at the cost of requiring authentication.

Key Principles

Rate limiting systems operate on a fundamental principle: track request counts against time-based quotas. When a request arrives, the system increments a counter associated with the client identifier. If the counter exceeds the allowed limit within the time window, the system rejects subsequent requests until the window resets or capacity becomes available.

The core components of a rate limiting system include the identifier mechanism, the counting algorithm, the time window definition, and the action taken when limits are exceeded. The identifier uniquely represents a client or resource. The counting algorithm determines how requests accumulate and decay over time. The time window establishes the period over which limits apply. The action defines system behavior when quotas are exceeded.

Time windows use either fixed or sliding calculations. Fixed windows divide time into discrete intervals—hourly, daily, or other durations—and reset counters at interval boundaries. A fixed hourly window starting at 3:00 PM resets at 4:00 PM regardless of when requests occurred. Sliding windows calculate limits based on the exact time elapsed from any given moment, providing smoother rate distribution but requiring more complex tracking.

# Fixed window example
class FixedWindowLimiter
  def initialize(limit, window_seconds)
    @limit = limit
    @window_seconds = window_seconds
    @counts = {}
  end

  def allow?(key)
    current_window = Time.now.to_i / @window_seconds
    @counts[key] ||= {}
    @counts[key][current_window] ||= 0

    if @counts[key][current_window] < @limit
      @counts[key][current_window] += 1
      true
    else
      false
    end
  end
end

limiter = FixedWindowLimiter.new(5, 60)
limiter.allow?("user_123")  # => true (request 1)
limiter.allow?("user_123")  # => true (request 2)
# ... 3 more requests
limiter.allow?("user_123")  # => false (limit exceeded)

Token bucket and leaky bucket algorithms represent alternative approaches. The token bucket algorithm maintains a bucket of tokens that replenishes at a constant rate. Each request consumes a token; when the bucket empties, requests are rejected or queued. This algorithm allows request bursts up to the bucket capacity while maintaining an average rate over time.

The leaky bucket algorithm enforces a strict output rate regardless of input rate. Requests enter a queue that drains at a constant rate. When the queue fills, new requests are rejected. This approach provides predictable output rates but can introduce latency since requests wait in the queue.

Distributed systems require coordination when implementing rate limiting. A single server can maintain counters in memory, but distributed architectures need shared state across instances. Centralized data stores like Redis provide atomic operations for counter management, enabling consistent rate limiting across server clusters. The trade-off involves latency introduced by network calls to the centralized store.

Rate limit responses must communicate limit details to clients. HTTP applications use status code 429 (Too Many Requests) along with headers indicating limit information:

X-RateLimit-Limit: 1000        # Total requests allowed
X-RateLimit-Remaining: 247     # Requests remaining in window
X-RateLimit-Reset: 1698364800  # Unix timestamp when limit resets
Retry-After: 3600              # Seconds until retry is allowed

These headers enable clients to implement intelligent retry logic and avoid unnecessary requests. Applications can display limit information to users or adjust request patterns based on remaining quota.

Implementation Approaches

Fixed window counters provide the simplest rate limiting implementation. The algorithm divides time into fixed intervals and counts requests within each interval. When the interval ends, the counter resets to zero. This approach requires minimal memory—one counter per client per window—and simple logic.

class FixedWindow
  def initialize(redis, limit, window_seconds)
    @redis = redis
    @limit = limit
    @window = window_seconds
  end

  def allow?(identifier)
    key = "rate_limit:#{identifier}:#{current_window}"
    count = @redis.incr(key)
    @redis.expire(key, @window * 2) if count == 1
    count <= @limit
  end

  private

  def current_window
    Time.now.to_i / @window
  end
end

Fixed windows suffer from boundary issues. A client can make the maximum number of requests at the end of one window and again at the start of the next, effectively doubling the rate for a brief period. Consider a limit of 100 requests per minute: a client making 100 requests at 12:00:59 and another 100 at 12:01:00 achieves 200 requests in two seconds, despite the per-minute limit.

Sliding window counters address boundary issues by calculating limits based on the exact time elapsed from the current moment. Instead of fixed intervals, the algorithm examines the request count over the previous N seconds relative to each request. This provides smoother rate enforcement but requires tracking individual request timestamps.

class SlidingWindow
  def initialize(redis, limit, window_seconds)
    @redis = redis
    @limit = limit
    @window = window_seconds
  end

  def allow?(identifier)
    key = "rate_limit:#{identifier}"
    now = Time.now.to_f
    window_start = now - @window

    @redis.multi do |transaction|
      transaction.zremrangebyscore(key, "-inf", window_start)
      transaction.zadd(key, now, "#{now}:#{rand}")
      transaction.zcount(key, window_start, "+inf")
      transaction.expire(key, @window * 2)
    end

    _, _, count, _ = @redis.exec
    count <= @limit
  end
end

The sliding window log approach stores timestamps of each request in a sorted set. For each new request, the algorithm removes expired timestamps, adds the current timestamp, and checks if the total count exceeds the limit. This provides precise rate limiting but consumes memory proportional to the limit and can be expensive for high-traffic scenarios.

Sliding window counters combine fixed window efficiency with sliding window precision. The algorithm maintains counters for the current and previous fixed windows, then calculates an approximation based on the percentage of the current window elapsed:

class SlidingWindowCounter
  def initialize(redis, limit, window_seconds)
    @redis = redis
    @limit = limit
    @window = window_seconds
  end

  def allow?(identifier)
    now = Time.now.to_i
    current_window = now / @window
    previous_window = current_window - 1
    elapsed_in_current = now % @window
    weight = (@window - elapsed_in_current).to_f / @window

    current_key = "rate_limit:#{identifier}:#{current_window}"
    previous_key = "rate_limit:#{identifier}:#{previous_window}"

    current_count = @redis.get(current_key).to_i
    previous_count = @redis.get(previous_key).to_i

    weighted_count = (previous_count * weight) + current_count

    if weighted_count < @limit
      @redis.multi do |transaction|
        transaction.incr(current_key)
        transaction.expire(current_key, @window * 2)
      end
      true
    else
      false
    end
  end
end

This hybrid approach reduces memory requirements while avoiding fixed window boundary problems. The weighted calculation approximates a true sliding window by considering how much of the previous window overlaps with the current observation period.

Token bucket algorithms model rate limiting as tokens in a bucket that refills at a constant rate. Each request consumes one or more tokens. The bucket has a maximum capacity, allowing request bursts up to that capacity. When tokens are exhausted, requests are rejected until the bucket refills.

class TokenBucket
  def initialize(redis, capacity, refill_rate)
    @redis = redis
    @capacity = capacity
    @refill_rate = refill_rate  # tokens per second
  end

  def allow?(identifier, tokens = 1)
    key = "token_bucket:#{identifier}"
    now = Time.now.to_f

    script = <<~LUA
      local capacity = tonumber(ARGV[1])
      local refill_rate = tonumber(ARGV[2])
      local now = tonumber(ARGV[3])
      local tokens_requested = tonumber(ARGV[4])

      local bucket = redis.call('hmget', KEYS[1], 'tokens', 'last_refill')
      local tokens = tonumber(bucket[1]) or capacity
      local last_refill = tonumber(bucket[2]) or now

      local elapsed = now - last_refill
      local refilled = elapsed * refill_rate
      tokens = math.min(capacity, tokens + refilled)

      if tokens >= tokens_requested then
        tokens = tokens - tokens_requested
        redis.call('hmset', KEYS[1], 'tokens', tokens, 'last_refill', now)
        redis.call('expire', KEYS[1], 3600)
        return 1
      else
        return 0
      end
    LUA

    result = @redis.eval(script, [key], [@capacity, @refill_rate, now, tokens])
    result == 1
  end
end

Token buckets handle variable request costs by consuming multiple tokens for expensive operations. A search query might consume 5 tokens while a simple read consumes 1 token. This provides granular control over resource consumption while maintaining the burst-handling benefits of token buckets.

Leaky bucket algorithms enforce a constant output rate by queuing requests that arrive faster than the drain rate. The bucket has a fixed capacity; requests fill the bucket and drain at a constant rate. When the bucket overflows, requests are rejected.

class LeakyBucket
  def initialize(redis, capacity, drain_rate)
    @redis = redis
    @capacity = capacity
    @drain_rate = drain_rate  # requests per second
  end

  def allow?(identifier)
    key = "leaky_bucket:#{identifier}"
    now = Time.now.to_f

    script = <<~LUA
      local capacity = tonumber(ARGV[1])
      local drain_rate = tonumber(ARGV[2])
      local now = tonumber(ARGV[3])

      local bucket = redis.call('hmget', KEYS[1], 'level', 'last_drain')
      local level = tonumber(bucket[1]) or 0
      local last_drain = tonumber(bucket[2]) or now

      local elapsed = now - last_drain
      local drained = elapsed * drain_rate
      level = math.max(0, level - drained)

      if level < capacity then
        level = level + 1
        redis.call('hmset', KEYS[1], 'level', level, 'last_drain', now)
        redis.call('expire', KEYS[1], 3600)
        return 1
      else
        return 0
      end
    LUA

    result = @redis.eval(script, [key], [@capacity, @drain_rate, now])
    result == 1
  end
end

Leaky buckets provide predictable request rates, making them suitable for protecting downstream systems with strict throughput requirements. The trade-off involves potential request latency since the algorithm enforces a maximum processing rate regardless of available capacity.

Distributed rate limiting requires consensus across multiple server instances. Centralized stores like Redis or Memcached provide atomic operations for counter management. Each server queries the shared store to check and update counters. Race conditions are prevented through atomic increment operations or Lua scripts that execute multiple commands atomically.

Ruby Implementation

Ruby applications typically implement rate limiting through Rack middleware, making it applicable to any Rack-based framework including Rails, Sinatra, and Grape. The Rack::Attack gem provides a mature, flexible rate limiting solution with multiple strategies and storage backends.

# config/initializers/rack_attack.rb
class Rack::Attack
  # Throttle general requests by IP address
  throttle('req/ip', limit: 300, period: 5.minutes) do |req|
    req.ip unless req.path.start_with?('/assets')
  end

  # Throttle API requests by API key
  throttle('api/key', limit: 1000, period: 1.hour) do |req|
    req.env['HTTP_X_API_KEY'] if req.path.start_with?('/api')
  end

  # Throttle login attempts by email
  throttle('logins/email', limit: 5, period: 20.minutes) do |req|
    if req.path == '/login' && req.post?
      req.params['email'].to_s.downcase.presence
    end
  end

  # Different limits for authenticated users
  throttle('authenticated/user', limit: 10000, period: 1.hour) do |req|
    if req.env['warden'].authenticate?
      req.env['warden'].user.id
    end
  end
end

Rack::Attack integrates with Rails cache backends, using Rails.cache by default. For production systems, Redis provides the performance and atomicity required for accurate rate limiting:

# config/initializers/rack_attack.rb
Rack::Attack.cache.store = ActiveSupport::Cache::RedisCacheStore.new(
  url: ENV['REDIS_URL'],
  namespace: 'rack_attack'
)

Custom throttle responses provide clients with clear feedback about rate limits:

class Rack::Attack
  throttled_responder = lambda do |request|
    match_data = request.env['rack.attack.match_data']
    now = match_data[:epoch_time]
    
    headers = {
      'X-RateLimit-Limit' => match_data[:limit].to_s,
      'X-RateLimit-Remaining' => '0',
      'X-RateLimit-Reset' => (now + (match_data[:period] - now % match_data[:period])).to_s,
      'Content-Type' => 'application/json'
    }

    [429, headers, [{ error: 'Rate limit exceeded' }.to_json]]
  end

  self.throttled_responder = throttled_responder
end

Implementing rate limiting without external dependencies involves creating custom Rack middleware:

class RateLimitMiddleware
  def initialize(app, options = {})
    @app = app
    @limit = options[:limit] || 100
    @period = options[:period] || 3600
    @store = options[:store] || {}
  end

  def call(env)
    request = Rack::Request.new(env)
    identifier = get_identifier(request)
    
    if rate_limit_exceeded?(identifier)
      return rate_limit_response
    end

    increment_counter(identifier)
    @app.call(env)
  end

  private

  def get_identifier(request)
    request.env['HTTP_X_API_KEY'] || request.ip
  end

  def rate_limit_exceeded?(identifier)
    current_window = Time.now.to_i / @period
    key = "#{identifier}:#{current_window}"
    (@store[key] || 0) >= @limit
  end

  def increment_counter(identifier)
    current_window = Time.now.to_i / @period
    key = "#{identifier}:#{current_window}"
    @store[key] ||= 0
    @store[key] += 1
  end

  def rate_limit_response
    [
      429,
      { 'Content-Type' => 'application/json' },
      [{ error: 'Rate limit exceeded' }.to_json]
    ]
  end
end

Rails applications can implement rate limiting at the controller level using concerns:

module RateLimited
  extend ActiveSupport::Concern

  included do
    before_action :check_rate_limit
  end

  private

  def check_rate_limit
    limiter = RateLimiter.new(
      key: rate_limit_key,
      limit: rate_limit_count,
      period: rate_limit_period
    )

    unless limiter.allow?
      response.headers['X-RateLimit-Limit'] = rate_limit_count.to_s
      response.headers['X-RateLimit-Remaining'] = '0'
      response.headers['X-RateLimit-Reset'] = limiter.reset_time.to_s
      
      render json: { error: 'Rate limit exceeded' }, status: :too_many_requests
    end
  end

  def rate_limit_key
    "rate_limit:#{controller_name}:#{action_name}:#{current_user&.id || request.ip}"
  end

  def rate_limit_count
    100
  end

  def rate_limit_period
    3600
  end
end

class ApiController < ApplicationController
  include RateLimited

  def rate_limit_count
    current_user&.premium? ? 10000 : 1000
  end
end

Background job processing requires rate limiting to prevent overwhelming external APIs or databases:

class RateLimitedJob < ApplicationJob
  queue_as :default

  def perform(user_id, action)
    limiter = TokenBucketLimiter.new(
      key: "api_calls:#{user_id}",
      capacity: 100,
      refill_rate: 10  # 10 tokens per second
    )

    unless limiter.consume(tokens: 1)
      # Reschedule job for later
      self.class.set(wait: limiter.time_until_tokens(1)).perform_later(user_id, action)
      return
    end

    # Perform API call
    ExternalService.call(user_id, action)
  end
end

Redis-backed rate limiting with Lua scripts ensures atomic operations and reduces network round trips:

class RedisRateLimiter
  LUA_SCRIPT = <<~LUA
    local key = KEYS[1]
    local limit = tonumber(ARGV[1])
    local window = tonumber(ARGV[2])
    local current_time = tonumber(ARGV[3])
    
    local window_start = current_time - window
    
    redis.call('zremrangebyscore', key, '-inf', window_start)
    local current_count = redis.call('zcard', key)
    
    if current_count < limit then
      redis.call('zadd', key, current_time, current_time)
      redis.call('expire', key, window * 2)
      return {1, limit - current_count - 1}
    else
      return {0, 0}
    end
  LUA

  def initialize(redis)
    @redis = redis
    @script_sha = @redis.script(:load, LUA_SCRIPT)
  end

  def allow?(key, limit, window)
    current_time = Time.now.to_f
    allowed, remaining = @redis.evalsha(
      @script_sha,
      [key],
      [limit, window, current_time]
    )
    
    [allowed == 1, remaining]
  end
end

Testing rate limiting requires simulating time progression and request sequences:

RSpec.describe RateLimiter do
  let(:redis) { Redis.new }
  let(:limiter) { described_class.new(redis, limit: 5, period: 60) }

  before { redis.flushdb }

  it 'allows requests within limit' do
    5.times do
      expect(limiter.allow?('user_1')).to be true
    end
  end

  it 'blocks requests exceeding limit' do
    5.times { limiter.allow?('user_1') }
    expect(limiter.allow?('user_1')).to be false
  end

  it 'resets after time window' do
    5.times { limiter.allow?('user_1') }
    
    Timecop.travel(61.seconds.from_now) do
      expect(limiter.allow?('user_1')).to be true
    end
  end

  it 'tracks different users separately' do
    5.times { limiter.allow?('user_1') }
    expect(limiter.allow?('user_2')).to be true
  end
end

Common Patterns

Per-user rate limiting provides individualized quotas based on user accounts. This approach prevents a single user from consuming excessive resources while allowing legitimate high-volume users appropriate access. The pattern requires authentication and associates limits with user identifiers rather than IP addresses.

class Rack::Attack
  throttle('api/user', limit: 1000, period: 1.hour) do |req|
    if req.path.start_with?('/api') && req.env['warden'].user
      req.env['warden'].user.id
    end
  end

  # Unauthenticated requests have lower limits
  throttle('api/ip', limit: 100, period: 1.hour) do |req|
    req.ip if req.path.start_with?('/api') && !req.env['warden'].user
  end
end

Tiered rate limiting assigns different limits based on user subscription levels or account types. Premium users receive higher quotas than free users, aligning resource allocation with business models:

class TieredRateLimiter
  LIMITS = {
    free: { limit: 100, period: 3600 },
    basic: { limit: 1000, period: 3600 },
    premium: { limit: 10000, period: 3600 },
    enterprise: { limit: 100000, period: 3600 }
  }

  def initialize(redis)
    @redis = redis
  end

  def allow?(user)
    tier_config = LIMITS[user.subscription_tier]
    key = "rate_limit:#{user.id}"
    
    RateLimiter.new(@redis, **tier_config).allow?(key)
  end
end

Dynamic rate limiting adjusts limits based on system load or user behavior. During high traffic periods, the system reduces limits to maintain stability. For users with established good behavior, limits gradually increase:

class DynamicRateLimiter
  def initialize(redis, base_limit:, base_period:)
    @redis = redis
    @base_limit = base_limit
    @base_period = base_period
  end

  def allow?(identifier)
    multiplier = calculate_multiplier(identifier)
    effective_limit = (@base_limit * multiplier).to_i
    
    RateLimiter.new(@redis, limit: effective_limit, period: @base_period)
      .allow?(identifier)
  end

  private

  def calculate_multiplier(identifier)
    # Check system load
    cpu_usage = SystemMetrics.cpu_usage
    load_factor = case cpu_usage
                  when 0..50 then 1.5
                  when 51..75 then 1.0
                  when 76..90 then 0.5
                  else 0.25
                  end

    # Check user reputation
    reputation_key = "reputation:#{identifier}"
    reputation_score = @redis.get(reputation_key).to_f
    reputation_factor = [0.5, reputation_score / 100.0, 2.0].sort[1]

    load_factor * reputation_factor
  end
end

Endpoint-specific rate limiting applies different limits to various API endpoints based on resource cost. Expensive operations like search or report generation have stricter limits than simple read operations:

class Rack::Attack
  # Strict limit for expensive search endpoint
  throttle('api/search', limit: 10, period: 1.minute) do |req|
    if req.path == '/api/search'
      req.env['HTTP_X_API_KEY'] || req.ip
    end
  end

  # Moderate limit for write operations
  throttle('api/write', limit: 100, period: 1.hour) do |req|
    if req.post? || req.put? || req.patch?
      req.env['HTTP_X_API_KEY'] || req.ip
    end
  end

  # Higher limit for read operations
  throttle('api/read', limit: 1000, period: 1.hour) do |req|
    if req.get?
      req.env['HTTP_X_API_KEY'] || req.ip
    end
  end
end

Distributed rate limiting coordinates limits across multiple application servers using a shared data store. This prevents each server from applying limits independently, which would effectively multiply the total allowed requests:

class DistributedRateLimiter
  def initialize
    @redis = Redis.new(
      url: ENV['REDIS_URL'],
      timeout: 1,
      reconnect_attempts: 3
    )
  end

  def allow?(key, limit, period)
    script = <<~LUA
      local current = redis.call('incr', KEYS[1])
      if current == 1 then
        redis.call('expire', KEYS[1], ARGV[1])
      end
      return current
    LUA

    window_key = "#{key}:#{Time.now.to_i / period}"
    current = @redis.eval(script, [window_key], [period])
    
    current <= limit
  rescue Redis::BaseError => e
    # Fail open on Redis errors to maintain availability
    Rails.logger.error("Rate limiter error: #{e.message}")
    true
  end
end

Graceful degradation handles rate limit failures by defaulting to permissive behavior when the rate limiting system becomes unavailable. This maintains application availability at the cost of temporarily unlimited access:

class ResilientRateLimiter
  def initialize(redis, fallback: :allow)
    @redis = redis
    @fallback = fallback
    @circuit_breaker = CircuitBreaker.new(threshold: 5, timeout: 30)
  end

  def allow?(key, limit, period)
    return fallback_allow? unless @circuit_breaker.allow_request?

    result = @redis.incr("#{key}:#{Time.now.to_i / period}")
    @circuit_breaker.record_success
    result <= limit
  rescue Redis::BaseError => e
    @circuit_breaker.record_failure
    Rails.logger.error("Rate limiter Redis error: #{e.message}")
    fallback_allow?
  end

  private

  def fallback_allow?
    @fallback == :allow
  end
end

Cost-based rate limiting assigns different costs to operations based on resource consumption. A single request might consume multiple quota units:

class CostBasedRateLimiter
  OPERATION_COSTS = {
    'GET /api/users/:id' => 1,
    'GET /api/search' => 5,
    'POST /api/reports' => 10,
    'POST /api/batch_import' => 50
  }

  def initialize(token_bucket)
    @token_bucket = token_bucket
  end

  def allow?(user, operation)
    cost = calculate_cost(operation)
    @token_bucket.consume(user.id, tokens: cost)
  end

  private

  def calculate_cost(operation)
    OPERATION_COSTS[operation] || 1
  end
end

Security Implications

Rate limiting serves as a primary defense against denial-of-service attacks. Without rate limits, attackers can exhaust server resources, database connections, or network bandwidth by flooding the system with requests. Effective rate limiting blocks these attacks by restricting request volumes from any single source.

Distributed denial-of-service (DDoS) attacks pose a greater challenge since requests originate from many IP addresses simultaneously. Pure IP-based rate limiting becomes less effective as attackers distribute load across compromised machines. Defense requires multiple layers: network-level rate limiting at load balancers or CDNs, application-level rate limiting for authenticated endpoints, and behavioral analysis to identify coordinated attack patterns.

class Rack::Attack
  # Block requests from known bad actors
  blocklist('block_bad_actors') do |req|
    BadActorRegistry.blocked?(req.ip)
  end

  # Aggressive rate limiting for suspicious patterns
  throttle('suspicious/ip', limit: 10, period: 1.minute) do |req|
    req.ip if suspicious_request?(req)
  end

  # Normal rate limiting for regular traffic
  throttle('req/ip', limit: 300, period: 5.minutes) do |req|
    req.ip
  end

  def self.suspicious_request?(req)
    # Detect patterns like rapid endpoint scanning
    ua = req.user_agent
    ua.nil? || ua.empty? || ua.match?(/bot|crawler|spider/i)
  end
end

Authentication bypass attacks attempt to circumvent rate limits by generating new identifiers. IP-based limiting can be evaded using proxy networks or VPNs. API key rotation evades key-based limiting. Defense requires tracking multiple identifier types and applying the strictest limit that applies:

class CompositeRateLimiter
  def initialize(redis)
    @redis = redis
  end

  def allow?(request)
    identifiers = [
      request.ip,
      request.env['HTTP_X_API_KEY'],
      request.env['warden']&.user&.id
    ].compact

    # Apply rate limit to each identifier
    # Fail if ANY limit is exceeded
    identifiers.all? do |identifier|
      RateLimiter.new(@redis, limit: 1000, period: 3600).allow?(identifier)
    end
  end
end

Credential stuffing attacks use stolen username-password pairs to gain unauthorized access. Rate limiting login endpoints prevents attackers from testing large numbers of credentials:

class Rack::Attack
  # Strict rate limit for login attempts per IP
  throttle('logins/ip', limit: 5, period: 5.minutes) do |req|
    if req.path == '/login' && req.post?
      req.ip
    end
  end

  # Strict rate limit for login attempts per email
  throttle('logins/email', limit: 5, period: 15.minutes) do |req|
    if req.path == '/login' && req.post?
      req.params['email'].to_s.downcase.presence
    end
  end

  # Progressive backoff after failed attempts
  throttle('failed_logins/email', limit: 10, period: 1.hour) do |req|
    if req.path == '/login' && req.post?
      email = req.params['email'].to_s.downcase
      key = "failed_logins:#{email}"
      
      # Track in after_action callback
      req.env['rack.attack.failed_login_email'] = email
      email if Redis.current.get(key).to_i > 3
    end
  end
end

# Track failed login attempts
Rails.application.config.after_initialize do
  ActiveSupport::Notifications.subscribe('process_action.action_controller') do |*args|
    event = ActiveSupport::Notifications::Event.new(*args)
    request = event.payload[:request]
    
    if email = request.env['rack.attack.failed_login_email']
      if event.payload[:status] == 401
        key = "failed_logins:#{email}"
        Redis.current.incr(key)
        Redis.current.expire(key, 3600)
      end
    end
  end
end

Enumeration attacks attempt to discover valid accounts, API endpoints, or resources by testing many possibilities. Rate limiting prevents rapid enumeration while allowing legitimate discovery:

class Rack::Attack
  # Limit requests that might be enumeration attempts
  throttle('enumeration/404s', limit: 50, period: 10.minutes) do |req|
    key = "enumeration:#{req.ip}"
    
    # Track in after_action callback
    req.env['rack.attack.enumeration_key'] = key
    req.ip
  end
end

# Track 404 responses as potential enumeration
Rails.application.config.after_initialize do
  ActiveSupport::Notifications.subscribe('process_action.action_controller') do |*args|
    event = ActiveSupport::Notifications::Event.new(*args)
    request = event.payload[:request]
    
    if key = request.env['rack.attack.enumeration_key']
      if event.payload[:status] == 404
        count = Redis.current.incr(key)
        Redis.current.expire(key, 600)
        
        # Aggressive limiting after many 404s
        if count > 20
          Rack::Attack::Allow2Ban.filter(request.ip, maxretry: 5, findtime: 600, bantime: 3600) do
            count > 20
          end
        end
      end
    end
  end
end

API key leakage occurs when keys are exposed in public repositories, client-side code, or intercepted traffic. Rate limiting per key prevents catastrophic abuse of leaked keys:

class ApiKeyRateLimiter
  def initialize(redis)
    @redis = redis
  end

  def allow?(api_key)
    # Normal rate limit
    normal_limit = RateLimiter.new(@redis, limit: 10000, period: 3600)
    return false unless normal_limit.allow?("api_key:#{api_key}")

    # Additional check for suspicious activity
    if suspicious_activity?(api_key)
      alert_security_team(api_key)
      return false
    end

    true
  end

  private

  def suspicious_activity?(api_key)
    # Check for rapid requests from many IPs
    key = "api_key_ips:#{api_key}"
    ip_count = @redis.zcard(key)
    
    # More than 100 unique IPs in an hour is suspicious
    ip_count > 100
  end

  def alert_security_team(api_key)
    SecurityAlert.create!(
      type: 'suspicious_api_key_usage',
      api_key_id: api_key,
      message: 'API key showing suspicious multi-IP usage'
    )
  end
end

Cache poisoning attacks attempt to pollute caches with malicious content by making requests that bypass origin rate limits but hit cached responses repeatedly. Rate limiting must occur before cache layers:

# Rate limit before serving from cache
class RateLimitBeforeCache
  def initialize(app)
    @app = app
    @rate_limiter = RateLimiter.new
  end

  def call(env)
    request = Rack::Request.new(env)
    
    # Rate limit check happens first
    unless @rate_limiter.allow?(request.ip)
      return [429, {}, ['Rate limit exceeded']]
    end

    # Cache middleware runs after rate limiting
    @app.call(env)
  end
end

# Middleware order matters
Rails.application.config.middleware.insert_before(
  Rack::Cache,
  RateLimitBeforeCache
)

Tools & Ecosystem

Rack::Attack provides the most widely-used rate limiting solution for Ruby web applications. The gem integrates with Rack middleware, supporting Rails, Sinatra, Grape, and other Rack-based frameworks. It offers flexible configuration, multiple throttling strategies, and integration with various cache backends:

# Gemfile
gem 'rack-attack'

# config/application.rb
config.middleware.use Rack::Attack

# config/initializers/rack_attack.rb
Rack::Attack.cache.store = ActiveSupport::Cache::RedisCacheStore.new(
  url: ENV['REDIS_URL']
)

class Rack::Attack
  safelist('allow_localhost') do |req|
    req.ip == '127.0.0.1' || req.ip == '::1'
  end

  blocklist('block_bad_ip') do |req|
    BlockedIpList.include?(req.ip)
  end

  throttle('req/ip', limit: 300, period: 5.minutes) do |req|
    req.ip
  end

  # Exponential backoff for repeated violations
  Rack::Attack.blocklist('penalized_ips') do |req|
    Rack::Attack::Allow2Ban.filter(req.ip, maxretry: 5, findtime: 10.minutes, bantime: 1.hour) do
      Rack::Attack.cache.count("violations:#{req.ip}", 10.minutes) > 5
    end
  end
end

Redis provides the backend storage for distributed rate limiting systems. Its atomic operations and built-in expiration make it suitable for high-performance rate limiting:

# Gemfile
gem 'redis'
gem 'hiredis'  # Optional C extension for better performance

# config/initializers/redis.rb
Redis.current = Redis.new(
  url: ENV['REDIS_URL'],
  driver: :hiredis,
  timeout: 1,
  reconnect_attempts: 3
)

# Connection pooling for threaded servers
Redis.current = ConnectionPool.new(size: 25, timeout: 5) do
  Redis.new(url: ENV['REDIS_URL'], driver: :hiredis)
end

The redis-throttle gem provides a lightweight rate limiting implementation focused specifically on throttling:

# Gemfile
gem 'redis-throttle'

require 'redis/throttle'

redis = Redis.new
throttle = Redis::Throttle.new(key: 'user:123', limits: [[10, 60], [100, 3600]], redis: redis)

if throttle.allowed?
  # Process request
else
  # Rate limit exceeded
end

Sidekiq Enterprise includes rate limiting for background jobs, preventing overwhelming external APIs or databases:

# Gemfile
gem 'sidekiq-enterprise'

class ApiCallJob
  include Sidekiq::Job
  
  sidekiq_options queue: 'api_calls',
                  limiter: {
                    name: 'external_api',
                    limit: 100,
                    period: 60
                  }

  def perform(user_id, action)
    ExternalApi.call(user_id, action)
  end
end

The rate_limiter gem provides a simple, flexible Ruby implementation without framework dependencies:

# Gemfile
gem 'rate_limiter'

require 'rate_limiter'

limiter = RateLimiter.new(
  interval: 3600,
  max: 1000,
  store: Redis.new
)

if limiter.exceeded?('user:123')
  # Rate limit exceeded
else
  limiter.increment('user:123')
  # Process request
end

Cloudflare and other CDN providers offer edge-based rate limiting that blocks requests before they reach application servers. This provides DDoS protection and reduces server load:

# Configure via Cloudflare API
require 'cloudflare'

cf = Cloudflare.new(
  email: ENV['CLOUDFLARE_EMAIL'],
  key: ENV['CLOUDFLARE_KEY']
)

zone = cf.zones.find_by_name('example.com')

# Create rate limiting rule
zone.firewall.rate_limits.create(
  threshold: 1000,
  period: 60,
  action: {
    mode: 'challenge',  # or 'block', 'simulate'
    timeout: 86400
  },
  match: {
    request: {
      methods: ['GET', 'POST'],
      url: 'example.com/api/*'
    }
  }
)

Kong API Gateway provides centralized rate limiting for microservices architectures:

# Enable rate limiting plugin
curl -X POST http://localhost:8001/plugins \
  --data "name=rate-limiting" \
  --data "config.minute=100" \
  --data "config.hour=1000" \
  --data "config.policy=redis" \
  --data "config.redis_host=redis.example.com"

The throttle gem offers a simple Ruby implementation for rate limiting with multiple backends:

# Gemfile
gem 'throttle'

throttle = Throttle.new(
  key: "api:#{user.id}",
  max: 100,
  period: 3600,
  store: Throttle::Store::Redis.new(Redis.current)
)

if throttle.allow?
  # Process request
  throttle.record!
else
  # Rate limit exceeded
end

Reference

Rate Limiting Algorithms

Algorithm	Time Complexity	Space Complexity	Use Case
Fixed Window	O(1)	O(n) per window	Simple quotas, low traffic
Sliding Window Log	O(log n)	O(m) where m is limit	Precise rate limiting, audit trails
Sliding Window Counter	O(1)	O(n) per window	Balance of precision and efficiency
Token Bucket	O(1)	O(1) per user	Burst handling, variable costs
Leaky Bucket	O(1)	O(1) per user	Smooth output rates, queue-based

HTTP Response Headers

Header	Description	Example
X-RateLimit-Limit	Total requests allowed in window	1000
X-RateLimit-Remaining	Requests remaining in current window	247
X-RateLimit-Reset	Unix timestamp when limit resets	1698364800
Retry-After	Seconds until retry is allowed	3600
X-RateLimit-Used	Number of requests consumed	753

HTTP Status Codes

Code	Name	When Used
429	Too Many Requests	Rate limit exceeded
503	Service Unavailable	System overload, temporary blocking
200	OK	Request allowed, within limits
403	Forbidden	Permanent block, security violation

Rack::Attack Throttle Options

Option	Type	Description
limit	Integer	Maximum requests allowed in period
period	Integer	Time window in seconds
name	String	Unique identifier for throttle rule
discriminator	Proc	Block returning identifier for grouping requests

Redis Commands for Rate Limiting

Command	Purpose	Example
INCR	Increment counter atomically	INCR rate_limit:user:123
EXPIRE	Set key expiration	EXPIRE rate_limit:user:123 3600
TTL	Get remaining expiration time	TTL rate_limit:user:123
ZADD	Add to sorted set with score	ZADD timestamps user123 1698364800
ZREMRANGEBYSCORE	Remove entries by score range	ZREMRANGEBYSCORE timestamps -inf 1698360000
ZCARD	Count sorted set members	ZCARD timestamps
EVAL	Execute Lua script atomically	EVAL script 1 key arg1 arg2

Common Rate Limit Configurations

Scenario	Limit	Period	Algorithm
Public API (free tier)	100	1 hour	Fixed window
Public API (paid tier)	10000	1 hour	Token bucket
Login attempts per IP	5	15 minutes	Sliding window
Login attempts per email	5	1 hour	Sliding window
Search queries	10	1 minute	Token bucket
File uploads	10	1 hour	Leaky bucket
Admin actions	100	1 hour	Fixed window
Webhook deliveries	1000	1 hour	Token bucket

Rate Limiter Configuration Checklist

Item	Consideration
Identifier	IP address, API key, user ID, session ID
Storage	Redis, Memcached, in-memory, database
Algorithm	Fixed window, sliding window, token bucket, leaky bucket
Limits	Per-second, per-minute, per-hour, per-day
Tiers	Free, basic, premium, enterprise
Scope	Global, per-endpoint, per-action, per-resource
Response	Block, throttle, queue, error message
Headers	Include rate limit information for clients
Monitoring	Track limit hits, violations, trends
Failover	Behavior when storage unavailable

Security Considerations Checklist

Consideration	Implementation
DDoS protection	Multiple layer rate limiting
Credential stuffing	Strict login rate limits per IP and email
Enumeration attacks	Limit 404 responses, random delays
API key leakage	Per-key limits, multi-IP detection
Cache poisoning	Rate limit before cache layer
Bypass attempts	Composite identifiers (IP + key + user)
Distributed attacks	CDN/edge rate limiting
Failed attempts	Progressive backoff, temporary blocks

Rate Limiting