CrackedRuby - API Rate Limiting

Overview

API rate limiting controls the number of requests a client can make to an API within a specified time period. The mechanism protects server resources, prevents abuse, ensures fair usage among clients, and maintains quality of service for all users. Rate limiting applies to REST APIs, GraphQL endpoints, WebSocket connections, and any network service exposed to multiple clients.

The concept originated from network traffic shaping and evolved into application-level controls as APIs became primary interfaces for web services. Modern rate limiting operates at multiple layers: infrastructure level through reverse proxies and load balancers, application level through middleware and frameworks, and distributed level through shared state stores like Redis or Memcached.

Rate limiting decisions depend on several factors: client identification (IP address, API key, user account), resource consumption patterns, business tier assignments, and geographic distribution. The system tracks request counts, timestamps, and quota consumption to determine whether to allow, throttle, or reject incoming requests.

# Basic rate limiting concept
class RateLimiter
  def initialize(max_requests, time_window)
    @max_requests = max_requests
    @time_window = time_window
    @requests = {}
  end

  def allow?(client_id)
    now = Time.now.to_i
    window_start = now - @time_window
    
    @requests[client_id] ||= []
    @requests[client_id].reject! { |timestamp| timestamp < window_start }
    
    if @requests[client_id].size < @max_requests
      @requests[client_id] << now
      true
    else
      false
    end
  end
end

limiter = RateLimiter.new(100, 3600) # 100 requests per hour
limiter.allow?("client_123")
# => true

Rate limiting responds to requests with specific HTTP status codes and headers. A 429 Too Many Requests status indicates quota exhaustion. Response headers communicate limit details: X-RateLimit-Limit shows the maximum requests allowed, X-RateLimit-Remaining indicates remaining quota, and X-RateLimit-Reset provides the timestamp when the quota resets.

Key Principles

Rate limiting operates on the concept of quotas and time windows. A quota defines the maximum number of requests permitted, while a time window specifies the duration over which the quota applies. The combination creates a rate: 1000 requests per hour, 10 requests per second, or 5000 requests per day.

Client identification forms the foundation of rate limiting. The system must reliably identify who makes each request to track usage accurately. IP addresses provide the simplest identification but fail with shared networks, NAT, and proxy servers. API keys offer better attribution but require key management infrastructure. OAuth tokens combine authentication with rate limiting, tying quotas to specific users or applications. Authenticated user IDs provide the most precise tracking but require authentication on all endpoints.

Time windows operate in two primary modes: fixed and sliding. Fixed windows reset at predetermined intervals—every hour at :00, every day at midnight. This approach simplifies implementation but creates thundering herd problems where clients rush to consume quota immediately after resets. Sliding windows track requests over a rolling time period from the current moment, distributing load more evenly but requiring more computational overhead.

# Fixed window implementation
class FixedWindowLimiter
  def initialize(limit, window_seconds)
    @limit = limit
    @window_seconds = window_seconds
    @windows = {}
  end

  def allow?(client_id)
    now = Time.now.to_i
    window_key = now / @window_seconds
    
    @windows[client_id] ||= {}
    @windows[client_id][window_key] ||= 0
    
    if @windows[client_id][window_key] < @limit
      @windows[client_id][window_key] += 1
      true
    else
      false
    end
  end
end

Quota enforcement strategies differ in strictness and behavior. Hard limits reject all requests exceeding the quota with immediate effect. Soft limits allow burst traffic above the quota with reduced priority or throttled response times. Graduated limits apply different thresholds based on client tier, subscription level, or historical behavior patterns.

Distributed systems require coordination between multiple servers to enforce global rate limits. Without coordination, each server applies limits independently, multiplying the effective quota by the number of servers. Shared state stores provide centralized counters, but introduce network latency and single points of failure. Approximate algorithms trade perfect accuracy for reduced coordination overhead.

Rate limiting granularity affects both protection effectiveness and implementation complexity. Per-IP limits protect against individual malicious actors but may block legitimate users behind shared NAT. Per-endpoint limits prevent resource-intensive operations from overwhelming specific handlers but require separate tracking for each route. Per-resource limits protect individual database records or external API quotas but multiply tracking overhead.

The system handles quota exhaustion through multiple strategies. Immediate rejection returns errors instantly but provides no queueing for burst traffic. Request queuing buffers excess requests for delayed processing but increases memory consumption and response latency. Token bucket algorithms allow controlled bursts above the base rate by accumulating unused capacity.

Implementation Approaches

Token bucket algorithms maintain a bucket that fills with tokens at a constant rate up to a maximum capacity. Each request consumes one or more tokens from the bucket. If sufficient tokens exist, the request proceeds and tokens are removed. If insufficient tokens remain, the request is rejected or delayed. The bucket capacity allows burst traffic up to the maximum, while the refill rate enforces the sustained request rate.

class TokenBucket
  def initialize(capacity, refill_rate)
    @capacity = capacity
    @tokens = capacity
    @refill_rate = refill_rate
    @last_refill = Time.now
  end

  def allow?(cost = 1)
    refill
    
    if @tokens >= cost
      @tokens -= cost
      true
    else
      false
    end
  end

  private

  def refill
    now = Time.now
    elapsed = now - @last_refill
    @tokens = [@tokens + (elapsed * @refill_rate), @capacity].min
    @last_refill = now
  end
end

Leaky bucket algorithms process requests at a constant rate regardless of input rate. Incoming requests enter a queue with fixed capacity. A background processor removes requests from the queue at the configured rate and forwards them for handling. If the queue fills, additional requests are rejected. This approach smooths traffic spikes but adds latency to all requests due to queueing.

Fixed window counters divide time into discrete windows and count requests within each window. The implementation stores a counter for each window period and increments it for each request. At window boundaries, counters reset to zero. This approach minimizes memory usage and computational overhead but creates uneven traffic distribution at window edges. A client can make maximum requests at 11:59:59 and again at 12:00:01, effectively doubling the rate for two seconds.

Sliding window counters improve on fixed windows by tracking individual request timestamps. The system maintains a list of timestamps for recent requests and removes timestamps outside the current window before checking the limit. This distributes traffic evenly but requires storing all timestamps within the window period, increasing memory consumption proportionally to the rate limit.

class SlidingWindowCounter
  def initialize(limit, window_seconds)
    @limit = limit
    @window_seconds = window_seconds
    @requests = Hash.new { |h, k| h[k] = [] }
  end

  def allow?(client_id)
    now = Time.now.to_f
    cutoff = now - @window_seconds
    
    @requests[client_id].reject! { |timestamp| timestamp < cutoff }
    
    if @requests[client_id].size < @limit
      @requests[client_id] << now
      true
    else
      false
    end
  end

  def remaining(client_id)
    now = Time.now.to_f
    cutoff = now - @window_seconds
    @requests[client_id].reject! { |timestamp| timestamp < cutoff }
    [@limit - @requests[client_id].size, 0].max
  end
end

Sliding window logs combine fixed window efficiency with sliding window accuracy through approximation. The algorithm maintains counters for the current and previous window periods. To estimate the current sliding window count, it weights the previous window counter by the percentage overlap with the current sliding window and adds the current window count. This reduces memory requirements while maintaining reasonable accuracy.

Generic cell rate algorithms (GCRA) provide precise rate enforcement with minimal state. The implementation tracks a theoretical arrival time (TAT) representing when the next request can arrive. Each request updates TAT by the inverse of the rate. If the current time exceeds TAT, the request is allowed immediately. If TAT is in the future, the request either waits or is rejected. This approach requires storing only a single timestamp per client.

Distributed rate limiting requires coordination mechanisms to maintain global limits across multiple servers. Centralized counters in Redis or Memcached provide accurate tracking but create network bottlenecks and single points of failure. Local counters with periodic synchronization reduce coordination overhead but introduce temporary inaccuracies. Consistent hashing distributes clients to specific servers, making each server authoritative for a subset of clients.

Ruby Implementation

The Rack::Attack middleware provides rate limiting for Rack-based applications including Rails and Sinatra. It integrates into the middleware stack to intercept requests before they reach application code. Rack::Attack supports throttling by arbitrary attributes, custom response handling, and multiple storage backends.

# config/initializers/rack_attack.rb
class Rack::Attack
  # Throttle general requests by IP
  throttle('req/ip', limit: 300, period: 5.minutes) do |req|
    req.ip
  end

  # Throttle login attempts by email
  throttle('logins/email', limit: 5, period: 20.seconds) do |req|
    if req.path == '/login' && req.post?
      req.params['email'].to_s.downcase.presence
    end
  end

  # Throttle API requests by API key
  throttle('api/key', limit: 1000, period: 1.hour) do |req|
    if req.path.start_with?('/api/')
      req.env['HTTP_X_API_KEY']
    end
  end

  # Custom response for throttled requests
  self.throttled_response = lambda do |env|
    retry_after = env['rack.attack.match_data'][:period]
    [
      429,
      {
        'Content-Type' => 'application/json',
        'Retry-After' => retry_after.to_s,
        'X-RateLimit-Limit' => env['rack.attack.match_data'][:limit].to_s,
        'X-RateLimit-Remaining' => '0'
      },
      [{ error: 'Rate limit exceeded' }.to_json]
    ]
  end
end

Redis provides distributed state storage for rate limiting across multiple application servers. The redis-rb gem offers atomic operations for incrementing counters and setting expiration times. Combining INCR and EXPIRE commands in a pipeline ensures atomic counter updates with automatic cleanup.

require 'redis'
require 'connection_pool'

class RedisRateLimiter
  def initialize(redis_pool, limit, window)
    @redis_pool = redis_pool
    @limit = limit
    @window = window
  end

  def allow?(key)
    @redis_pool.with do |redis|
      current = redis.incr(key)
      redis.expire(key, @window) if current == 1
      current <= @limit
    end
  end

  def remaining(key)
    @redis_pool.with do |redis|
      current = redis.get(key).to_i
      [@limit - current, 0].max
    end
  end

  def reset_at(key)
    @redis_pool.with do |redis|
      ttl = redis.ttl(key)
      ttl > 0 ? Time.now.to_i + ttl : nil
    end
  end
end

redis_pool = ConnectionPool.new(size: 5, timeout: 5) { Redis.new }
limiter = RedisRateLimiter.new(redis_pool, 100, 3600)

if limiter.allow?("user:123")
  # Process request
else
  # Return 429 error
end

The ratelimit gem implements token bucket algorithms with multiple backend support. It provides a clean API for checking limits and retrieving quota information. The gem handles bucket refilling automatically and supports both in-memory and Redis storage.

require 'ratelimit'

# In-memory rate limiter
limiter = Ratelimit.new("requests", bucket_span: 1.hour, bucket_interval: 1, redis: nil)

# Redis-backed rate limiter
redis = Redis.new
limiter = Ratelimit.new("api_requests", bucket_span: 10, bucket_interval: 1, redis: redis)

# Check and consume quota
if limiter.add("user_#{user_id}")
  # Process request
else
  # Quota exceeded
  remaining = limiter.count("user_#{user_id}")
end

# Get detailed bucket information
bucket = limiter.bucket("user_#{user_id}")
puts "Count: #{bucket[:count]}"
puts "Limit: #{bucket[:limit]}"

Rails controllers implement rate limiting through before_action filters that check quotas before executing controller actions. The filter architecture allows applying limits selectively to specific actions or controller subclasses.

class ApiController < ApplicationController
  before_action :check_rate_limit

  private

  def check_rate_limit
    limiter = RedisRateLimiter.new($redis_pool, rate_limit, 3600)
    key = "api:#{current_user.id}:#{Time.now.hour}"

    unless limiter.allow?(key)
      response.set_header('X-RateLimit-Limit', rate_limit.to_s)
      response.set_header('X-RateLimit-Remaining', '0')
      response.set_header('X-RateLimit-Reset', limiter.reset_at(key).to_s)
      
      render json: { 
        error: 'Rate limit exceeded',
        retry_after: limiter.reset_at(key)
      }, status: :too_many_requests
      return
    end

    response.set_header('X-RateLimit-Limit', rate_limit.to_s)
    response.set_header('X-RateLimit-Remaining', limiter.remaining(key).to_s)
  end

  def rate_limit
    case current_user.subscription_tier
    when 'premium' then 10000
    when 'basic' then 1000
    else 100
    end
  end
end

The dalli gem provides Memcached client functionality for distributed rate limiting with Memcached as the storage backend. Memcached offers lower latency than Redis for simple counter operations but lacks advanced data structures.

require 'dalli'

class MemcachedRateLimiter
  def initialize(memcached_client, limit, window)
    @cache = memcached_client
    @limit = limit
    @window = window
  end

  def allow?(key)
    count = @cache.incr(key, 1, @window, 1)
    count ? count <= @limit : false
  rescue Dalli::DalliError
    true # Fail open on cache errors
  end
end

cache = Dalli::Client.new('localhost:11211')
limiter = MemcachedRateLimiter.new(cache, 1000, 3600)

Design Considerations

Algorithm selection depends on traffic patterns and business requirements. Token bucket algorithms suit APIs with bursty traffic where occasional spikes above the base rate are acceptable. The bucket capacity determines burst size while the refill rate controls sustained throughput. Applications requiring strict rate enforcement without bursts should use leaky bucket or sliding window approaches.

Fixed window counters minimize computational overhead and memory usage, making them appropriate for high-throughput systems where precision is less critical. The edge effect creates temporary rate doubling at window boundaries, which may be acceptable for APIs with generous limits. Applications requiring even traffic distribution must use sliding window algorithms despite their higher resource consumption.

Distributed versus local rate limiting involves trade-offs between accuracy and performance. Centralized Redis counters provide accurate global limits across all servers but introduce network latency and create dependencies on external systems. Local counters eliminate network overhead but multiply effective limits by server count. Hybrid approaches use local counters with periodic synchronization, accepting temporary inaccuracies for better performance.

Client identification strategy affects both security and user experience. IP-based limiting is simple but blocks legitimate users behind corporate NAT or VPNs. API key limiting provides better attribution but requires key management infrastructure and may complicate public APIs. Combined approaches using IP limits for unauthenticated requests and key limits for authenticated access balance security and usability.

Quota exhaustion handling determines system behavior under load. Hard rejection with immediate 429 responses protects server resources but provides poor user experience during traffic spikes. Request queuing improves user experience by processing requests eventually but increases memory consumption and latency. Priority systems can queue high-value requests while rejecting low-priority traffic.

class PriorityRateLimiter
  def initialize(limits)
    @limits = limits # { 'premium' => 10000, 'basic' => 1000, 'free' => 100 }
    @counters = Hash.new { |h, k| h[k] = Hash.new(0) }
  end

  def allow?(client_id, tier)
    window = current_window
    @counters[tier][window] += 1
    @counters[tier][window] <= @limits[tier]
  end

  def current_window
    Time.now.to_i / 3600
  end
end

Monitoring and observability requirements influence implementation choices. Simple counters provide basic metrics but lack insight into traffic patterns. Detailed logging of rejected requests enables analysis of abuse patterns and limit tuning. Distributed tracing helps debug rate limiting behavior across multiple services.

Cost considerations vary by storage backend and algorithm complexity. In-memory rate limiting costs nothing for storage but loses state on server restarts. Redis requires infrastructure costs and operational overhead but provides persistence and cross-server coordination. Memcached offers lower latency than Redis but lacks data persistence.

Security Implications

Rate limiting serves as a primary defense against denial of service attacks. Without limits, attackers can exhaust server resources through request floods. Effective rate limiting requires multiple layers: aggressive limits for unauthenticated requests, moderate limits for authenticated users, and special allowances for trusted partners or premium tiers.

Distributed denial of service attacks using many IP addresses bypass simple IP-based rate limiting. Defense requires rate limiting at multiple granularities: per-IP for individual attackers, global limits to protect overall capacity, and per-endpoint limits to prevent resource-intensive operations from overwhelming specific handlers.

class MultiLayerRateLimiter
  def initialize(redis_pool)
    @redis_pool = redis_pool
  end

  def allow?(ip:, user_id:, endpoint:)
    checks = [
      ["ip:#{ip}", 100, 60],              # 100 req/min per IP
      ["user:#{user_id}", 1000, 3600],    # 1000 req/hour per user
      ["endpoint:#{endpoint}", 10000, 60], # 10000 req/min per endpoint
      ["global", 100000, 60]               # 100k req/min globally
    ]

    checks.all? do |key, limit, window|
      @redis_pool.with do |redis|
        current = redis.incr(key)
        redis.expire(key, window) if current == 1
        current <= limit
      end
    end
  end
end

Credential stuffing attacks attempt to validate stolen username/password pairs through login attempts. Rate limiting login endpoints prevents automated credential testing. Stricter limits on failed login attempts specifically target this attack vector. Combining rate limiting with exponential backoff increases delay between attempts.

API key enumeration attacks probe for valid keys through systematic guessing. Rate limiting API authentication endpoints prevents rapid key testing. Limiting failed authentication attempts per IP and globally protects against distributed enumeration. Logging failed authentication attempts enables detection of enumeration patterns.

Rate limit bypass attempts exploit multiple identities or distributed infrastructure. Attackers rotate IP addresses using VPNs or botnets. Defense requires fingerprinting requests beyond IP addresses: user agents, TLS fingerprints, behavioral patterns. Rate limiting by multiple attributes simultaneously increases bypass difficulty.

Token theft and replay protection requires associating rate limits with authenticated identities rather than just API keys. Stolen API keys could be used up to rate limits before detection. Combining rate limiting with token expiration, refresh requirements, and anomaly detection improves security.

Side channel attacks infer information from rate limit responses. Response timing differences between rate-limited and allowed requests leak information about system state. Consistent response times for both allowed and rejected requests prevent timing attacks. Error messages should not reveal limit details that aid attackers.

Practical Examples

Basic API rate limiting for a Rails API restricts requests by API key with standard limits and header communication:

class Api::V1::BaseController < ApplicationController
  before_action :authenticate_api_key
  before_action :check_rate_limit

  private

  def authenticate_api_key
    @api_key = ApiKey.find_by(key: request.headers['X-API-Key'])
    render_unauthorized unless @api_key
  end

  def check_rate_limit
    limiter = RedisRateLimiter.new($redis_pool, @api_key.hourly_limit, 3600)
    key = "api_key:#{@api_key.id}:#{Time.now.to_i / 3600}"

    remaining = limiter.remaining(key)
    reset_at = limiter.reset_at(key)

    response.set_header('X-RateLimit-Limit', @api_key.hourly_limit.to_s)
    response.set_header('X-RateLimit-Remaining', remaining.to_s)
    response.set_header('X-RateLimit-Reset', reset_at.to_s) if reset_at

    unless limiter.allow?(key)
      render json: {
        error: 'Rate limit exceeded',
        limit: @api_key.hourly_limit,
        reset_at: reset_at
      }, status: :too_many_requests
    end
  end

  def render_unauthorized
    render json: { error: 'Invalid API key' }, status: :unauthorized
  end
end

GraphQL APIs require field-level rate limiting because clients specify which fields to query. Schema-level rate limiting counts query complexity rather than raw requests. Each field receives a complexity score, and queries exceeding total complexity limits are rejected:

class GraphqlRateLimiter
  def initialize(redis_pool, max_complexity, window)
    @redis_pool = redis_pool
    @max_complexity = max_complexity
    @window = window
  end

  def allow?(user_id, query_complexity)
    key = "graphql:#{user_id}:#{Time.now.to_i / @window}"
    
    @redis_pool.with do |redis|
      current = redis.get(key).to_i
      new_total = current + query_complexity
      
      if new_total <= @max_complexity
        redis.setex(key, @window, new_total)
        true
      else
        false
      end
    end
  end
end

class GraphqlController < ApplicationController
  def execute
    limiter = GraphqlRateLimiter.new($redis_pool, 10000, 3600)
    query_complexity = calculate_complexity(params[:query])

    unless limiter.allow?(current_user.id, query_complexity)
      render json: { 
        errors: [{ message: 'Query complexity exceeds rate limit' }] 
      }, status: :too_many_requests
      return
    end

    result = MySchema.execute(params[:query], context: { current_user: current_user })
    render json: result
  end

  private

  def calculate_complexity(query_string)
    # Parse query and sum field complexities
    query = GraphQL.parse(query_string)
    ComplexityAnalyzer.new.calculate(query)
  end
end

Tiered rate limiting provides different quotas based on subscription levels with automatic tier detection:

class TieredRateLimiter
  TIERS = {
    'free' => { hourly: 100, daily: 1000 },
    'starter' => { hourly: 1000, daily: 20000 },
    'professional' => { hourly: 10000, daily: 200000 },
    'enterprise' => { hourly: 100000, daily: 2000000 }
  }.freeze

  def initialize(redis_pool)
    @redis_pool = redis_pool
  end

  def allow?(user)
    tier = TIERS[user.subscription_tier]
    
    hourly_key = "user:#{user.id}:hour:#{Time.now.to_i / 3600}"
    daily_key = "user:#{user.id}:day:#{Time.now.to_i / 86400}"

    @redis_pool.with do |redis|
      hourly_count = redis.incr(hourly_key)
      daily_count = redis.incr(daily_key)
      
      redis.expire(hourly_key, 3600) if hourly_count == 1
      redis.expire(daily_key, 86400) if daily_count == 1

      hourly_count <= tier[:hourly] && daily_count <= tier[:daily]
    end
  end

  def quota_info(user)
    tier = TIERS[user.subscription_tier]
    
    hourly_key = "user:#{user.id}:hour:#{Time.now.to_i / 3600}"
    daily_key = "user:#{user.id}:day:#{Time.now.to_i / 86400}"

    @redis_pool.with do |redis|
      hourly_used = redis.get(hourly_key).to_i
      daily_used = redis.get(daily_key).to_i

      {
        tier: user.subscription_tier,
        hourly: {
          limit: tier[:hourly],
          used: hourly_used,
          remaining: tier[:hourly] - hourly_used
        },
        daily: {
          limit: tier[:daily],
          used: daily_used,
          remaining: tier[:daily] - daily_used
        }
      }
    end
  end
end

WebSocket rate limiting requires different strategies than HTTP request limiting. Connections persist for extended periods, making per-connection message rate limiting necessary:

class RateLimitedWebSocket
  def initialize(user, redis_pool)
    @user = user
    @redis_pool = redis_pool
    @limiter = RedisRateLimiter.new(redis_pool, 60, 60) # 60 messages per minute
  end

  def on_message(message)
    key = "ws:#{@user.id}:#{Time.now.to_i / 60}"

    unless @limiter.allow?(key)
      send_error('Rate limit exceeded. Maximum 60 messages per minute.')
      return
    end

    process_message(message)
  end

  def send_error(message)
    send_frame({ 
      type: 'error', 
      message: message,
      rate_limit: {
        limit: 60,
        window: 60,
        retry_after: Time.now.to_i % 60
      }
    }.to_json)
  end

  def process_message(message)
    # Handle valid message
  end
end

Reference

Rate Limiting Algorithms Comparison

Algorithm	Memory Usage	Accuracy	Burst Handling	Implementation Complexity
Fixed Window	Low	Low	Poor	Simple
Sliding Window	High	High	Good	Moderate
Sliding Window Log	Medium	High	Good	Moderate
Token Bucket	Low	High	Excellent	Moderate
Leaky Bucket	Medium	High	None	Complex
GCRA	Very Low	High	Good	Simple

HTTP Response Headers

Header	Description	Example
X-RateLimit-Limit	Maximum requests allowed in window	1000
X-RateLimit-Remaining	Requests remaining in current window	247
X-RateLimit-Reset	Unix timestamp when quota resets	1678901234
Retry-After	Seconds until retry allowed	3600
X-RateLimit-Used	Requests consumed in current window	753

Common HTTP Status Codes

Code	Meaning	Usage
429	Too Many Requests	Client exceeded rate limit
503	Service Unavailable	Server overloaded, apply backoff
509	Bandwidth Limit Exceeded	Data transfer quota exceeded

Redis Commands for Rate Limiting

Command	Purpose	Example
INCR	Increment counter atomically	INCR user:123:requests
EXPIRE	Set key expiration	EXPIRE user:123:requests 3600
TTL	Get remaining time to live	TTL user:123:requests
GET	Retrieve current count	GET user:123:requests
SETEX	Set with expiration atomically	SETEX user:123:requests 3600 1
INCRBY	Increment by specific amount	INCRBY user:123:cost 5

Configuration Parameters

Parameter	Description	Typical Values
Window Size	Duration for rate calculation	60s, 3600s, 86400s
Request Limit	Maximum requests per window	100-10000
Burst Capacity	Additional requests allowed in burst	10-100
Client Identifier	Attribute for tracking clients	IP, API key, user ID
Failure Mode	Behavior when storage unavailable	Fail open, fail closed
Cleanup Interval	Frequency of expired data removal	300s-3600s

Ruby Gems for Rate Limiting

Gem	Algorithm Support	Storage Backend	Best For
rack-attack	Fixed window, throttle	Redis, Memcached, memory	Rails/Rack apps
ratelimit	Token bucket	Redis, memory	General Ruby apps
redis-throttle	Token bucket	Redis	Redis-based systems
turnstile	Custom	Redis	High-performance APIs
prorate	Leaky bucket	Redis	Sustained rate limiting

Time Window Calculations

Window Type	Calculation	Key Format
Per Second	timestamp / 1	prefix:client:second:N
Per Minute	timestamp / 60	prefix:client:minute:N
Per Hour	timestamp / 3600	prefix:client:hour:N
Per Day	timestamp / 86400	prefix:client:day:N
Rolling Hour	current_time - 3600	prefix:client:rolling

Cost Calculation Strategies

Strategy	Description	Use Case
Uniform	All requests cost 1	Simple APIs
Endpoint-based	Different costs per endpoint	Mixed resource usage
Payload-based	Cost proportional to data size	Upload/download APIs
Complexity-based	Cost based on computation required	GraphQL, search APIs
Resource-based	Cost based on resources consumed	Database queries, CPU time

API Rate Limiting