CrackedRuby CrackedRuby

Overview

API rate limiting controls the number of requests a client can make to an API within a specified time period. The mechanism protects server resources, prevents abuse, ensures fair usage among clients, and maintains quality of service for all users. Rate limiting applies to REST APIs, GraphQL endpoints, WebSocket connections, and any network service exposed to multiple clients.

The concept originated from network traffic shaping and evolved into application-level controls as APIs became primary interfaces for web services. Modern rate limiting operates at multiple layers: infrastructure level through reverse proxies and load balancers, application level through middleware and frameworks, and distributed level through shared state stores like Redis or Memcached.

Rate limiting decisions depend on several factors: client identification (IP address, API key, user account), resource consumption patterns, business tier assignments, and geographic distribution. The system tracks request counts, timestamps, and quota consumption to determine whether to allow, throttle, or reject incoming requests.

# Basic rate limiting concept
class RateLimiter
  def initialize(max_requests, time_window)
    @max_requests = max_requests
    @time_window = time_window
    @requests = {}
  end

  def allow?(client_id)
    now = Time.now.to_i
    window_start = now - @time_window
    
    @requests[client_id] ||= []
    @requests[client_id].reject! { |timestamp| timestamp < window_start }
    
    if @requests[client_id].size < @max_requests
      @requests[client_id] << now
      true
    else
      false
    end
  end
end

limiter = RateLimiter.new(100, 3600) # 100 requests per hour
limiter.allow?("client_123")
# => true

Rate limiting responds to requests with specific HTTP status codes and headers. A 429 Too Many Requests status indicates quota exhaustion. Response headers communicate limit details: X-RateLimit-Limit shows the maximum requests allowed, X-RateLimit-Remaining indicates remaining quota, and X-RateLimit-Reset provides the timestamp when the quota resets.

Key Principles

Rate limiting operates on the concept of quotas and time windows. A quota defines the maximum number of requests permitted, while a time window specifies the duration over which the quota applies. The combination creates a rate: 1000 requests per hour, 10 requests per second, or 5000 requests per day.

Client identification forms the foundation of rate limiting. The system must reliably identify who makes each request to track usage accurately. IP addresses provide the simplest identification but fail with shared networks, NAT, and proxy servers. API keys offer better attribution but require key management infrastructure. OAuth tokens combine authentication with rate limiting, tying quotas to specific users or applications. Authenticated user IDs provide the most precise tracking but require authentication on all endpoints.

Time windows operate in two primary modes: fixed and sliding. Fixed windows reset at predetermined intervals—every hour at :00, every day at midnight. This approach simplifies implementation but creates thundering herd problems where clients rush to consume quota immediately after resets. Sliding windows track requests over a rolling time period from the current moment, distributing load more evenly but requiring more computational overhead.

# Fixed window implementation
class FixedWindowLimiter
  def initialize(limit, window_seconds)
    @limit = limit
    @window_seconds = window_seconds
    @windows = {}
  end

  def allow?(client_id)
    now = Time.now.to_i
    window_key = now / @window_seconds
    
    @windows[client_id] ||= {}
    @windows[client_id][window_key] ||= 0
    
    if @windows[client_id][window_key] < @limit
      @windows[client_id][window_key] += 1
      true
    else
      false
    end
  end
end

Quota enforcement strategies differ in strictness and behavior. Hard limits reject all requests exceeding the quota with immediate effect. Soft limits allow burst traffic above the quota with reduced priority or throttled response times. Graduated limits apply different thresholds based on client tier, subscription level, or historical behavior patterns.

Distributed systems require coordination between multiple servers to enforce global rate limits. Without coordination, each server applies limits independently, multiplying the effective quota by the number of servers. Shared state stores provide centralized counters, but introduce network latency and single points of failure. Approximate algorithms trade perfect accuracy for reduced coordination overhead.

Rate limiting granularity affects both protection effectiveness and implementation complexity. Per-IP limits protect against individual malicious actors but may block legitimate users behind shared NAT. Per-endpoint limits prevent resource-intensive operations from overwhelming specific handlers but require separate tracking for each route. Per-resource limits protect individual database records or external API quotas but multiply tracking overhead.

The system handles quota exhaustion through multiple strategies. Immediate rejection returns errors instantly but provides no queueing for burst traffic. Request queuing buffers excess requests for delayed processing but increases memory consumption and response latency. Token bucket algorithms allow controlled bursts above the base rate by accumulating unused capacity.

Implementation Approaches

Token bucket algorithms maintain a bucket that fills with tokens at a constant rate up to a maximum capacity. Each request consumes one or more tokens from the bucket. If sufficient tokens exist, the request proceeds and tokens are removed. If insufficient tokens remain, the request is rejected or delayed. The bucket capacity allows burst traffic up to the maximum, while the refill rate enforces the sustained request rate.

class TokenBucket
  def initialize(capacity, refill_rate)
    @capacity = capacity
    @tokens = capacity
    @refill_rate = refill_rate
    @last_refill = Time.now
  end

  def allow?(cost = 1)
    refill
    
    if @tokens >= cost
      @tokens -= cost
      true
    else
      false
    end
  end

  private

  def refill
    now = Time.now
    elapsed = now - @last_refill
    @tokens = [@tokens + (elapsed * @refill_rate), @capacity].min
    @last_refill = now
  end
end

Leaky bucket algorithms process requests at a constant rate regardless of input rate. Incoming requests enter a queue with fixed capacity. A background processor removes requests from the queue at the configured rate and forwards them for handling. If the queue fills, additional requests are rejected. This approach smooths traffic spikes but adds latency to all requests due to queueing.

Fixed window counters divide time into discrete windows and count requests within each window. The implementation stores a counter for each window period and increments it for each request. At window boundaries, counters reset to zero. This approach minimizes memory usage and computational overhead but creates uneven traffic distribution at window edges. A client can make maximum requests at 11:59:59 and again at 12:00:01, effectively doubling the rate for two seconds.

Sliding window counters improve on fixed windows by tracking individual request timestamps. The system maintains a list of timestamps for recent requests and removes timestamps outside the current window before checking the limit. This distributes traffic evenly but requires storing all timestamps within the window period, increasing memory consumption proportionally to the rate limit.

class SlidingWindowCounter
  def initialize(limit, window_seconds)
    @limit = limit
    @window_seconds = window_seconds
    @requests = Hash.new { |h, k| h[k] = [] }
  end

  def allow?(client_id)
    now = Time.now.to_f
    cutoff = now - @window_seconds
    
    @requests[client_id].reject! { |timestamp| timestamp < cutoff }
    
    if @requests[client_id].size < @limit
      @requests[client_id] << now
      true
    else
      false
    end
  end

  def remaining(client_id)
    now = Time.now.to_f
    cutoff = now - @window_seconds
    @requests[client_id].reject! { |timestamp| timestamp < cutoff }
    [@limit - @requests[client_id].size, 0].max
  end
end

Sliding window logs combine fixed window efficiency with sliding window accuracy through approximation. The algorithm maintains counters for the current and previous window periods. To estimate the current sliding window count, it weights the previous window counter by the percentage overlap with the current sliding window and adds the current window count. This reduces memory requirements while maintaining reasonable accuracy.

Generic cell rate algorithms (GCRA) provide precise rate enforcement with minimal state. The implementation tracks a theoretical arrival time (TAT) representing when the next request can arrive. Each request updates TAT by the inverse of the rate. If the current time exceeds TAT, the request is allowed immediately. If TAT is in the future, the request either waits or is rejected. This approach requires storing only a single timestamp per client.

Distributed rate limiting requires coordination mechanisms to maintain global limits across multiple servers. Centralized counters in Redis or Memcached provide accurate tracking but create network bottlenecks and single points of failure. Local counters with periodic synchronization reduce coordination overhead but introduce temporary inaccuracies. Consistent hashing distributes clients to specific servers, making each server authoritative for a subset of clients.

Ruby Implementation

The Rack::Attack middleware provides rate limiting for Rack-based applications including Rails and Sinatra. It integrates into the middleware stack to intercept requests before they reach application code. Rack::Attack supports throttling by arbitrary attributes, custom response handling, and multiple storage backends.

# config/initializers/rack_attack.rb
class Rack::Attack
  # Throttle general requests by IP
  throttle('req/ip', limit: 300, period: 5.minutes) do |req|
    req.ip
  end

  # Throttle login attempts by email
  throttle('logins/email', limit: 5, period: 20.seconds) do |req|
    if req.path == '/login' && req.post?
      req.params['email'].to_s.downcase.presence
    end
  end

  # Throttle API requests by API key
  throttle('api/key', limit: 1000, period: 1.hour) do |req|
    if req.path.start_with?('/api/')
      req.env['HTTP_X_API_KEY']
    end
  end

  # Custom response for throttled requests
  self.throttled_response = lambda do |env|
    retry_after = env['rack.attack.match_data'][:period]
    [
      429,
      {
        'Content-Type' => 'application/json',
        'Retry-After' => retry_after.to_s,
        'X-RateLimit-Limit' => env['rack.attack.match_data'][:limit].to_s,
        'X-RateLimit-Remaining' => '0'
      },
      [{ error: 'Rate limit exceeded' }.to_json]
    ]
  end
end

Redis provides distributed state storage for rate limiting across multiple application servers. The redis-rb gem offers atomic operations for incrementing counters and setting expiration times. Combining INCR and EXPIRE commands in a pipeline ensures atomic counter updates with automatic cleanup.

require 'redis'
require 'connection_pool'

class RedisRateLimiter
  def initialize(redis_pool, limit, window)
    @redis_pool = redis_pool
    @limit = limit
    @window = window
  end

  def allow?(key)
    @redis_pool.with do |redis|
      current = redis.incr(key)
      redis.expire(key, @window) if current == 1
      current <= @limit
    end
  end

  def remaining(key)
    @redis_pool.with do |redis|
      current = redis.get(key).to_i
      [@limit - current, 0].max
    end
  end

  def reset_at(key)
    @redis_pool.with do |redis|
      ttl = redis.ttl(key)
      ttl > 0 ? Time.now.to_i + ttl : nil
    end
  end
end

redis_pool = ConnectionPool.new(size: 5, timeout: 5) { Redis.new }
limiter = RedisRateLimiter.new(redis_pool, 100, 3600)

if limiter.allow?("user:123")
  # Process request
else
  # Return 429 error
end

The ratelimit gem implements token bucket algorithms with multiple backend support. It provides a clean API for checking limits and retrieving quota information. The gem handles bucket refilling automatically and supports both in-memory and Redis storage.

require 'ratelimit'

# In-memory rate limiter
limiter = Ratelimit.new("requests", bucket_span: 1.hour, bucket_interval: 1, redis: nil)

# Redis-backed rate limiter
redis = Redis.new
limiter = Ratelimit.new("api_requests", bucket_span: 10, bucket_interval: 1, redis: redis)

# Check and consume quota
if limiter.add("user_#{user_id}")
  # Process request
else
  # Quota exceeded
  remaining = limiter.count("user_#{user_id}")
end

# Get detailed bucket information
bucket = limiter.bucket("user_#{user_id}")
puts "Count: #{bucket[:count]}"
puts "Limit: #{bucket[:limit]}"

Rails controllers implement rate limiting through before_action filters that check quotas before executing controller actions. The filter architecture allows applying limits selectively to specific actions or controller subclasses.

class ApiController < ApplicationController
  before_action :check_rate_limit

  private

  def check_rate_limit
    limiter = RedisRateLimiter.new($redis_pool, rate_limit, 3600)
    key = "api:#{current_user.id}:#{Time.now.hour}"

    unless limiter.allow?(key)
      response.set_header('X-RateLimit-Limit', rate_limit.to_s)
      response.set_header('X-RateLimit-Remaining', '0')
      response.set_header('X-RateLimit-Reset', limiter.reset_at(key).to_s)
      
      render json: { 
        error: 'Rate limit exceeded',
        retry_after: limiter.reset_at(key)
      }, status: :too_many_requests
      return
    end

    response.set_header('X-RateLimit-Limit', rate_limit.to_s)
    response.set_header('X-RateLimit-Remaining', limiter.remaining(key).to_s)
  end

  def rate_limit
    case current_user.subscription_tier
    when 'premium' then 10000
    when 'basic' then 1000
    else 100
    end
  end
end

The dalli gem provides Memcached client functionality for distributed rate limiting with Memcached as the storage backend. Memcached offers lower latency than Redis for simple counter operations but lacks advanced data structures.

require 'dalli'

class MemcachedRateLimiter
  def initialize(memcached_client, limit, window)
    @cache = memcached_client
    @limit = limit
    @window = window
  end

  def allow?(key)
    count = @cache.incr(key, 1, @window, 1)
    count ? count <= @limit : false
  rescue Dalli::DalliError
    true # Fail open on cache errors
  end
end

cache = Dalli::Client.new('localhost:11211')
limiter = MemcachedRateLimiter.new(cache, 1000, 3600)

Design Considerations

Algorithm selection depends on traffic patterns and business requirements. Token bucket algorithms suit APIs with bursty traffic where occasional spikes above the base rate are acceptable. The bucket capacity determines burst size while the refill rate controls sustained throughput. Applications requiring strict rate enforcement without bursts should use leaky bucket or sliding window approaches.

Fixed window counters minimize computational overhead and memory usage, making them appropriate for high-throughput systems where precision is less critical. The edge effect creates temporary rate doubling at window boundaries, which may be acceptable for APIs with generous limits. Applications requiring even traffic distribution must use sliding window algorithms despite their higher resource consumption.

Distributed versus local rate limiting involves trade-offs between accuracy and performance. Centralized Redis counters provide accurate global limits across all servers but introduce network latency and create dependencies on external systems. Local counters eliminate network overhead but multiply effective limits by server count. Hybrid approaches use local counters with periodic synchronization, accepting temporary inaccuracies for better performance.

Client identification strategy affects both security and user experience. IP-based limiting is simple but blocks legitimate users behind corporate NAT or VPNs. API key limiting provides better attribution but requires key management infrastructure and may complicate public APIs. Combined approaches using IP limits for unauthenticated requests and key limits for authenticated access balance security and usability.

Quota exhaustion handling determines system behavior under load. Hard rejection with immediate 429 responses protects server resources but provides poor user experience during traffic spikes. Request queuing improves user experience by processing requests eventually but increases memory consumption and latency. Priority systems can queue high-value requests while rejecting low-priority traffic.

class PriorityRateLimiter
  def initialize(limits)
    @limits = limits # { 'premium' => 10000, 'basic' => 1000, 'free' => 100 }
    @counters = Hash.new { |h, k| h[k] = Hash.new(0) }
  end

  def allow?(client_id, tier)
    window = current_window
    @counters[tier][window] += 1
    @counters[tier][window] <= @limits[tier]
  end

  def current_window
    Time.now.to_i / 3600
  end
end

Monitoring and observability requirements influence implementation choices. Simple counters provide basic metrics but lack insight into traffic patterns. Detailed logging of rejected requests enables analysis of abuse patterns and limit tuning. Distributed tracing helps debug rate limiting behavior across multiple services.

Cost considerations vary by storage backend and algorithm complexity. In-memory rate limiting costs nothing for storage but loses state on server restarts. Redis requires infrastructure costs and operational overhead but provides persistence and cross-server coordination. Memcached offers lower latency than Redis but lacks data persistence.

Security Implications

Rate limiting serves as a primary defense against denial of service attacks. Without limits, attackers can exhaust server resources through request floods. Effective rate limiting requires multiple layers: aggressive limits for unauthenticated requests, moderate limits for authenticated users, and special allowances for trusted partners or premium tiers.

Distributed denial of service attacks using many IP addresses bypass simple IP-based rate limiting. Defense requires rate limiting at multiple granularities: per-IP for individual attackers, global limits to protect overall capacity, and per-endpoint limits to prevent resource-intensive operations from overwhelming specific handlers.

class MultiLayerRateLimiter
  def initialize(redis_pool)
    @redis_pool = redis_pool
  end

  def allow?(ip:, user_id:, endpoint:)
    checks = [
      ["ip:#{ip}", 100, 60],              # 100 req/min per IP
      ["user:#{user_id}", 1000, 3600],    # 1000 req/hour per user
      ["endpoint:#{endpoint}", 10000, 60], # 10000 req/min per endpoint
      ["global", 100000, 60]               # 100k req/min globally
    ]

    checks.all? do |key, limit, window|
      @redis_pool.with do |redis|
        current = redis.incr(key)
        redis.expire(key, window) if current == 1
        current <= limit
      end
    end
  end
end

Credential stuffing attacks attempt to validate stolen username/password pairs through login attempts. Rate limiting login endpoints prevents automated credential testing. Stricter limits on failed login attempts specifically target this attack vector. Combining rate limiting with exponential backoff increases delay between attempts.

API key enumeration attacks probe for valid keys through systematic guessing. Rate limiting API authentication endpoints prevents rapid key testing. Limiting failed authentication attempts per IP and globally protects against distributed enumeration. Logging failed authentication attempts enables detection of enumeration patterns.

Rate limit bypass attempts exploit multiple identities or distributed infrastructure. Attackers rotate IP addresses using VPNs or botnets. Defense requires fingerprinting requests beyond IP addresses: user agents, TLS fingerprints, behavioral patterns. Rate limiting by multiple attributes simultaneously increases bypass difficulty.

Token theft and replay protection requires associating rate limits with authenticated identities rather than just API keys. Stolen API keys could be used up to rate limits before detection. Combining rate limiting with token expiration, refresh requirements, and anomaly detection improves security.

Side channel attacks infer information from rate limit responses. Response timing differences between rate-limited and allowed requests leak information about system state. Consistent response times for both allowed and rejected requests prevent timing attacks. Error messages should not reveal limit details that aid attackers.

Practical Examples

Basic API rate limiting for a Rails API restricts requests by API key with standard limits and header communication:

class Api::V1::BaseController < ApplicationController
  before_action :authenticate_api_key
  before_action :check_rate_limit

  private

  def authenticate_api_key
    @api_key = ApiKey.find_by(key: request.headers['X-API-Key'])
    render_unauthorized unless @api_key
  end

  def check_rate_limit
    limiter = RedisRateLimiter.new($redis_pool, @api_key.hourly_limit, 3600)
    key = "api_key:#{@api_key.id}:#{Time.now.to_i / 3600}"

    remaining = limiter.remaining(key)
    reset_at = limiter.reset_at(key)

    response.set_header('X-RateLimit-Limit', @api_key.hourly_limit.to_s)
    response.set_header('X-RateLimit-Remaining', remaining.to_s)
    response.set_header('X-RateLimit-Reset', reset_at.to_s) if reset_at

    unless limiter.allow?(key)
      render json: {
        error: 'Rate limit exceeded',
        limit: @api_key.hourly_limit,
        reset_at: reset_at
      }, status: :too_many_requests
    end
  end

  def render_unauthorized
    render json: { error: 'Invalid API key' }, status: :unauthorized
  end
end

GraphQL APIs require field-level rate limiting because clients specify which fields to query. Schema-level rate limiting counts query complexity rather than raw requests. Each field receives a complexity score, and queries exceeding total complexity limits are rejected:

class GraphqlRateLimiter
  def initialize(redis_pool, max_complexity, window)
    @redis_pool = redis_pool
    @max_complexity = max_complexity
    @window = window
  end

  def allow?(user_id, query_complexity)
    key = "graphql:#{user_id}:#{Time.now.to_i / @window}"
    
    @redis_pool.with do |redis|
      current = redis.get(key).to_i
      new_total = current + query_complexity
      
      if new_total <= @max_complexity
        redis.setex(key, @window, new_total)
        true
      else
        false
      end
    end
  end
end

class GraphqlController < ApplicationController
  def execute
    limiter = GraphqlRateLimiter.new($redis_pool, 10000, 3600)
    query_complexity = calculate_complexity(params[:query])

    unless limiter.allow?(current_user.id, query_complexity)
      render json: { 
        errors: [{ message: 'Query complexity exceeds rate limit' }] 
      }, status: :too_many_requests
      return
    end

    result = MySchema.execute(params[:query], context: { current_user: current_user })
    render json: result
  end

  private

  def calculate_complexity(query_string)
    # Parse query and sum field complexities
    query = GraphQL.parse(query_string)
    ComplexityAnalyzer.new.calculate(query)
  end
end

Tiered rate limiting provides different quotas based on subscription levels with automatic tier detection:

class TieredRateLimiter
  TIERS = {
    'free' => { hourly: 100, daily: 1000 },
    'starter' => { hourly: 1000, daily: 20000 },
    'professional' => { hourly: 10000, daily: 200000 },
    'enterprise' => { hourly: 100000, daily: 2000000 }
  }.freeze

  def initialize(redis_pool)
    @redis_pool = redis_pool
  end

  def allow?(user)
    tier = TIERS[user.subscription_tier]
    
    hourly_key = "user:#{user.id}:hour:#{Time.now.to_i / 3600}"
    daily_key = "user:#{user.id}:day:#{Time.now.to_i / 86400}"

    @redis_pool.with do |redis|
      hourly_count = redis.incr(hourly_key)
      daily_count = redis.incr(daily_key)
      
      redis.expire(hourly_key, 3600) if hourly_count == 1
      redis.expire(daily_key, 86400) if daily_count == 1

      hourly_count <= tier[:hourly] && daily_count <= tier[:daily]
    end
  end

  def quota_info(user)
    tier = TIERS[user.subscription_tier]
    
    hourly_key = "user:#{user.id}:hour:#{Time.now.to_i / 3600}"
    daily_key = "user:#{user.id}:day:#{Time.now.to_i / 86400}"

    @redis_pool.with do |redis|
      hourly_used = redis.get(hourly_key).to_i
      daily_used = redis.get(daily_key).to_i

      {
        tier: user.subscription_tier,
        hourly: {
          limit: tier[:hourly],
          used: hourly_used,
          remaining: tier[:hourly] - hourly_used
        },
        daily: {
          limit: tier[:daily],
          used: daily_used,
          remaining: tier[:daily] - daily_used
        }
      }
    end
  end
end

WebSocket rate limiting requires different strategies than HTTP request limiting. Connections persist for extended periods, making per-connection message rate limiting necessary:

class RateLimitedWebSocket
  def initialize(user, redis_pool)
    @user = user
    @redis_pool = redis_pool
    @limiter = RedisRateLimiter.new(redis_pool, 60, 60) # 60 messages per minute
  end

  def on_message(message)
    key = "ws:#{@user.id}:#{Time.now.to_i / 60}"

    unless @limiter.allow?(key)
      send_error('Rate limit exceeded. Maximum 60 messages per minute.')
      return
    end

    process_message(message)
  end

  def send_error(message)
    send_frame({ 
      type: 'error', 
      message: message,
      rate_limit: {
        limit: 60,
        window: 60,
        retry_after: Time.now.to_i % 60
      }
    }.to_json)
  end

  def process_message(message)
    # Handle valid message
  end
end

Reference

Rate Limiting Algorithms Comparison

Algorithm Memory Usage Accuracy Burst Handling Implementation Complexity
Fixed Window Low Low Poor Simple
Sliding Window High High Good Moderate
Sliding Window Log Medium High Good Moderate
Token Bucket Low High Excellent Moderate
Leaky Bucket Medium High None Complex
GCRA Very Low High Good Simple

HTTP Response Headers

Header Description Example
X-RateLimit-Limit Maximum requests allowed in window 1000
X-RateLimit-Remaining Requests remaining in current window 247
X-RateLimit-Reset Unix timestamp when quota resets 1678901234
Retry-After Seconds until retry allowed 3600
X-RateLimit-Used Requests consumed in current window 753

Common HTTP Status Codes

Code Meaning Usage
429 Too Many Requests Client exceeded rate limit
503 Service Unavailable Server overloaded, apply backoff
509 Bandwidth Limit Exceeded Data transfer quota exceeded

Redis Commands for Rate Limiting

Command Purpose Example
INCR Increment counter atomically INCR user:123:requests
EXPIRE Set key expiration EXPIRE user:123:requests 3600
TTL Get remaining time to live TTL user:123:requests
GET Retrieve current count GET user:123:requests
SETEX Set with expiration atomically SETEX user:123:requests 3600 1
INCRBY Increment by specific amount INCRBY user:123:cost 5

Configuration Parameters

Parameter Description Typical Values
Window Size Duration for rate calculation 60s, 3600s, 86400s
Request Limit Maximum requests per window 100-10000
Burst Capacity Additional requests allowed in burst 10-100
Client Identifier Attribute for tracking clients IP, API key, user ID
Failure Mode Behavior when storage unavailable Fail open, fail closed
Cleanup Interval Frequency of expired data removal 300s-3600s

Ruby Gems for Rate Limiting

Gem Algorithm Support Storage Backend Best For
rack-attack Fixed window, throttle Redis, Memcached, memory Rails/Rack apps
ratelimit Token bucket Redis, memory General Ruby apps
redis-throttle Token bucket Redis Redis-based systems
turnstile Custom Redis High-performance APIs
prorate Leaky bucket Redis Sustained rate limiting

Time Window Calculations

Window Type Calculation Key Format
Per Second timestamp / 1 prefix:client:second:N
Per Minute timestamp / 60 prefix:client:minute:N
Per Hour timestamp / 3600 prefix:client:hour:N
Per Day timestamp / 86400 prefix:client:day:N
Rolling Hour current_time - 3600 prefix:client:rolling

Cost Calculation Strategies

Strategy Description Use Case
Uniform All requests cost 1 Simple APIs
Endpoint-based Different costs per endpoint Mixed resource usage
Payload-based Cost proportional to data size Upload/download APIs
Complexity-based Cost based on computation required GraphQL, search APIs
Resource-based Cost based on resources consumed Database queries, CPU time