Overview
API rate limiting controls the number of requests a client can make to an API within a specified time period. The mechanism protects server resources, prevents abuse, ensures fair usage among clients, and maintains quality of service for all users. Rate limiting applies to REST APIs, GraphQL endpoints, WebSocket connections, and any network service exposed to multiple clients.
The concept originated from network traffic shaping and evolved into application-level controls as APIs became primary interfaces for web services. Modern rate limiting operates at multiple layers: infrastructure level through reverse proxies and load balancers, application level through middleware and frameworks, and distributed level through shared state stores like Redis or Memcached.
Rate limiting decisions depend on several factors: client identification (IP address, API key, user account), resource consumption patterns, business tier assignments, and geographic distribution. The system tracks request counts, timestamps, and quota consumption to determine whether to allow, throttle, or reject incoming requests.
# Basic rate limiting concept
class RateLimiter
def initialize(max_requests, time_window)
@max_requests = max_requests
@time_window = time_window
@requests = {}
end
def allow?(client_id)
now = Time.now.to_i
window_start = now - @time_window
@requests[client_id] ||= []
@requests[client_id].reject! { |timestamp| timestamp < window_start }
if @requests[client_id].size < @max_requests
@requests[client_id] << now
true
else
false
end
end
end
limiter = RateLimiter.new(100, 3600) # 100 requests per hour
limiter.allow?("client_123")
# => true
Rate limiting responds to requests with specific HTTP status codes and headers. A 429 Too Many Requests status indicates quota exhaustion. Response headers communicate limit details: X-RateLimit-Limit shows the maximum requests allowed, X-RateLimit-Remaining indicates remaining quota, and X-RateLimit-Reset provides the timestamp when the quota resets.
Key Principles
Rate limiting operates on the concept of quotas and time windows. A quota defines the maximum number of requests permitted, while a time window specifies the duration over which the quota applies. The combination creates a rate: 1000 requests per hour, 10 requests per second, or 5000 requests per day.
Client identification forms the foundation of rate limiting. The system must reliably identify who makes each request to track usage accurately. IP addresses provide the simplest identification but fail with shared networks, NAT, and proxy servers. API keys offer better attribution but require key management infrastructure. OAuth tokens combine authentication with rate limiting, tying quotas to specific users or applications. Authenticated user IDs provide the most precise tracking but require authentication on all endpoints.
Time windows operate in two primary modes: fixed and sliding. Fixed windows reset at predetermined intervals—every hour at :00, every day at midnight. This approach simplifies implementation but creates thundering herd problems where clients rush to consume quota immediately after resets. Sliding windows track requests over a rolling time period from the current moment, distributing load more evenly but requiring more computational overhead.
# Fixed window implementation
class FixedWindowLimiter
def initialize(limit, window_seconds)
@limit = limit
@window_seconds = window_seconds
@windows = {}
end
def allow?(client_id)
now = Time.now.to_i
window_key = now / @window_seconds
@windows[client_id] ||= {}
@windows[client_id][window_key] ||= 0
if @windows[client_id][window_key] < @limit
@windows[client_id][window_key] += 1
true
else
false
end
end
end
Quota enforcement strategies differ in strictness and behavior. Hard limits reject all requests exceeding the quota with immediate effect. Soft limits allow burst traffic above the quota with reduced priority or throttled response times. Graduated limits apply different thresholds based on client tier, subscription level, or historical behavior patterns.
Distributed systems require coordination between multiple servers to enforce global rate limits. Without coordination, each server applies limits independently, multiplying the effective quota by the number of servers. Shared state stores provide centralized counters, but introduce network latency and single points of failure. Approximate algorithms trade perfect accuracy for reduced coordination overhead.
Rate limiting granularity affects both protection effectiveness and implementation complexity. Per-IP limits protect against individual malicious actors but may block legitimate users behind shared NAT. Per-endpoint limits prevent resource-intensive operations from overwhelming specific handlers but require separate tracking for each route. Per-resource limits protect individual database records or external API quotas but multiply tracking overhead.
The system handles quota exhaustion through multiple strategies. Immediate rejection returns errors instantly but provides no queueing for burst traffic. Request queuing buffers excess requests for delayed processing but increases memory consumption and response latency. Token bucket algorithms allow controlled bursts above the base rate by accumulating unused capacity.
Implementation Approaches
Token bucket algorithms maintain a bucket that fills with tokens at a constant rate up to a maximum capacity. Each request consumes one or more tokens from the bucket. If sufficient tokens exist, the request proceeds and tokens are removed. If insufficient tokens remain, the request is rejected or delayed. The bucket capacity allows burst traffic up to the maximum, while the refill rate enforces the sustained request rate.
class TokenBucket
def initialize(capacity, refill_rate)
@capacity = capacity
@tokens = capacity
@refill_rate = refill_rate
@last_refill = Time.now
end
def allow?(cost = 1)
refill
if @tokens >= cost
@tokens -= cost
true
else
false
end
end
private
def refill
now = Time.now
elapsed = now - @last_refill
@tokens = [@tokens + (elapsed * @refill_rate), @capacity].min
@last_refill = now
end
end
Leaky bucket algorithms process requests at a constant rate regardless of input rate. Incoming requests enter a queue with fixed capacity. A background processor removes requests from the queue at the configured rate and forwards them for handling. If the queue fills, additional requests are rejected. This approach smooths traffic spikes but adds latency to all requests due to queueing.
Fixed window counters divide time into discrete windows and count requests within each window. The implementation stores a counter for each window period and increments it for each request. At window boundaries, counters reset to zero. This approach minimizes memory usage and computational overhead but creates uneven traffic distribution at window edges. A client can make maximum requests at 11:59:59 and again at 12:00:01, effectively doubling the rate for two seconds.
Sliding window counters improve on fixed windows by tracking individual request timestamps. The system maintains a list of timestamps for recent requests and removes timestamps outside the current window before checking the limit. This distributes traffic evenly but requires storing all timestamps within the window period, increasing memory consumption proportionally to the rate limit.
class SlidingWindowCounter
def initialize(limit, window_seconds)
@limit = limit
@window_seconds = window_seconds
@requests = Hash.new { |h, k| h[k] = [] }
end
def allow?(client_id)
now = Time.now.to_f
cutoff = now - @window_seconds
@requests[client_id].reject! { |timestamp| timestamp < cutoff }
if @requests[client_id].size < @limit
@requests[client_id] << now
true
else
false
end
end
def remaining(client_id)
now = Time.now.to_f
cutoff = now - @window_seconds
@requests[client_id].reject! { |timestamp| timestamp < cutoff }
[@limit - @requests[client_id].size, 0].max
end
end
Sliding window logs combine fixed window efficiency with sliding window accuracy through approximation. The algorithm maintains counters for the current and previous window periods. To estimate the current sliding window count, it weights the previous window counter by the percentage overlap with the current sliding window and adds the current window count. This reduces memory requirements while maintaining reasonable accuracy.
Generic cell rate algorithms (GCRA) provide precise rate enforcement with minimal state. The implementation tracks a theoretical arrival time (TAT) representing when the next request can arrive. Each request updates TAT by the inverse of the rate. If the current time exceeds TAT, the request is allowed immediately. If TAT is in the future, the request either waits or is rejected. This approach requires storing only a single timestamp per client.
Distributed rate limiting requires coordination mechanisms to maintain global limits across multiple servers. Centralized counters in Redis or Memcached provide accurate tracking but create network bottlenecks and single points of failure. Local counters with periodic synchronization reduce coordination overhead but introduce temporary inaccuracies. Consistent hashing distributes clients to specific servers, making each server authoritative for a subset of clients.
Ruby Implementation
The Rack::Attack middleware provides rate limiting for Rack-based applications including Rails and Sinatra. It integrates into the middleware stack to intercept requests before they reach application code. Rack::Attack supports throttling by arbitrary attributes, custom response handling, and multiple storage backends.
# config/initializers/rack_attack.rb
class Rack::Attack
# Throttle general requests by IP
throttle('req/ip', limit: 300, period: 5.minutes) do |req|
req.ip
end
# Throttle login attempts by email
throttle('logins/email', limit: 5, period: 20.seconds) do |req|
if req.path == '/login' && req.post?
req.params['email'].to_s.downcase.presence
end
end
# Throttle API requests by API key
throttle('api/key', limit: 1000, period: 1.hour) do |req|
if req.path.start_with?('/api/')
req.env['HTTP_X_API_KEY']
end
end
# Custom response for throttled requests
self.throttled_response = lambda do |env|
retry_after = env['rack.attack.match_data'][:period]
[
429,
{
'Content-Type' => 'application/json',
'Retry-After' => retry_after.to_s,
'X-RateLimit-Limit' => env['rack.attack.match_data'][:limit].to_s,
'X-RateLimit-Remaining' => '0'
},
[{ error: 'Rate limit exceeded' }.to_json]
]
end
end
Redis provides distributed state storage for rate limiting across multiple application servers. The redis-rb gem offers atomic operations for incrementing counters and setting expiration times. Combining INCR and EXPIRE commands in a pipeline ensures atomic counter updates with automatic cleanup.
require 'redis'
require 'connection_pool'
class RedisRateLimiter
def initialize(redis_pool, limit, window)
@redis_pool = redis_pool
@limit = limit
@window = window
end
def allow?(key)
@redis_pool.with do |redis|
current = redis.incr(key)
redis.expire(key, @window) if current == 1
current <= @limit
end
end
def remaining(key)
@redis_pool.with do |redis|
current = redis.get(key).to_i
[@limit - current, 0].max
end
end
def reset_at(key)
@redis_pool.with do |redis|
ttl = redis.ttl(key)
ttl > 0 ? Time.now.to_i + ttl : nil
end
end
end
redis_pool = ConnectionPool.new(size: 5, timeout: 5) { Redis.new }
limiter = RedisRateLimiter.new(redis_pool, 100, 3600)
if limiter.allow?("user:123")
# Process request
else
# Return 429 error
end
The ratelimit gem implements token bucket algorithms with multiple backend support. It provides a clean API for checking limits and retrieving quota information. The gem handles bucket refilling automatically and supports both in-memory and Redis storage.
require 'ratelimit'
# In-memory rate limiter
limiter = Ratelimit.new("requests", bucket_span: 1.hour, bucket_interval: 1, redis: nil)
# Redis-backed rate limiter
redis = Redis.new
limiter = Ratelimit.new("api_requests", bucket_span: 10, bucket_interval: 1, redis: redis)
# Check and consume quota
if limiter.add("user_#{user_id}")
# Process request
else
# Quota exceeded
remaining = limiter.count("user_#{user_id}")
end
# Get detailed bucket information
bucket = limiter.bucket("user_#{user_id}")
puts "Count: #{bucket[:count]}"
puts "Limit: #{bucket[:limit]}"
Rails controllers implement rate limiting through before_action filters that check quotas before executing controller actions. The filter architecture allows applying limits selectively to specific actions or controller subclasses.
class ApiController < ApplicationController
before_action :check_rate_limit
private
def check_rate_limit
limiter = RedisRateLimiter.new($redis_pool, rate_limit, 3600)
key = "api:#{current_user.id}:#{Time.now.hour}"
unless limiter.allow?(key)
response.set_header('X-RateLimit-Limit', rate_limit.to_s)
response.set_header('X-RateLimit-Remaining', '0')
response.set_header('X-RateLimit-Reset', limiter.reset_at(key).to_s)
render json: {
error: 'Rate limit exceeded',
retry_after: limiter.reset_at(key)
}, status: :too_many_requests
return
end
response.set_header('X-RateLimit-Limit', rate_limit.to_s)
response.set_header('X-RateLimit-Remaining', limiter.remaining(key).to_s)
end
def rate_limit
case current_user.subscription_tier
when 'premium' then 10000
when 'basic' then 1000
else 100
end
end
end
The dalli gem provides Memcached client functionality for distributed rate limiting with Memcached as the storage backend. Memcached offers lower latency than Redis for simple counter operations but lacks advanced data structures.
require 'dalli'
class MemcachedRateLimiter
def initialize(memcached_client, limit, window)
@cache = memcached_client
@limit = limit
@window = window
end
def allow?(key)
count = @cache.incr(key, 1, @window, 1)
count ? count <= @limit : false
rescue Dalli::DalliError
true # Fail open on cache errors
end
end
cache = Dalli::Client.new('localhost:11211')
limiter = MemcachedRateLimiter.new(cache, 1000, 3600)
Design Considerations
Algorithm selection depends on traffic patterns and business requirements. Token bucket algorithms suit APIs with bursty traffic where occasional spikes above the base rate are acceptable. The bucket capacity determines burst size while the refill rate controls sustained throughput. Applications requiring strict rate enforcement without bursts should use leaky bucket or sliding window approaches.
Fixed window counters minimize computational overhead and memory usage, making them appropriate for high-throughput systems where precision is less critical. The edge effect creates temporary rate doubling at window boundaries, which may be acceptable for APIs with generous limits. Applications requiring even traffic distribution must use sliding window algorithms despite their higher resource consumption.
Distributed versus local rate limiting involves trade-offs between accuracy and performance. Centralized Redis counters provide accurate global limits across all servers but introduce network latency and create dependencies on external systems. Local counters eliminate network overhead but multiply effective limits by server count. Hybrid approaches use local counters with periodic synchronization, accepting temporary inaccuracies for better performance.
Client identification strategy affects both security and user experience. IP-based limiting is simple but blocks legitimate users behind corporate NAT or VPNs. API key limiting provides better attribution but requires key management infrastructure and may complicate public APIs. Combined approaches using IP limits for unauthenticated requests and key limits for authenticated access balance security and usability.
Quota exhaustion handling determines system behavior under load. Hard rejection with immediate 429 responses protects server resources but provides poor user experience during traffic spikes. Request queuing improves user experience by processing requests eventually but increases memory consumption and latency. Priority systems can queue high-value requests while rejecting low-priority traffic.
class PriorityRateLimiter
def initialize(limits)
@limits = limits # { 'premium' => 10000, 'basic' => 1000, 'free' => 100 }
@counters = Hash.new { |h, k| h[k] = Hash.new(0) }
end
def allow?(client_id, tier)
window = current_window
@counters[tier][window] += 1
@counters[tier][window] <= @limits[tier]
end
def current_window
Time.now.to_i / 3600
end
end
Monitoring and observability requirements influence implementation choices. Simple counters provide basic metrics but lack insight into traffic patterns. Detailed logging of rejected requests enables analysis of abuse patterns and limit tuning. Distributed tracing helps debug rate limiting behavior across multiple services.
Cost considerations vary by storage backend and algorithm complexity. In-memory rate limiting costs nothing for storage but loses state on server restarts. Redis requires infrastructure costs and operational overhead but provides persistence and cross-server coordination. Memcached offers lower latency than Redis but lacks data persistence.
Security Implications
Rate limiting serves as a primary defense against denial of service attacks. Without limits, attackers can exhaust server resources through request floods. Effective rate limiting requires multiple layers: aggressive limits for unauthenticated requests, moderate limits for authenticated users, and special allowances for trusted partners or premium tiers.
Distributed denial of service attacks using many IP addresses bypass simple IP-based rate limiting. Defense requires rate limiting at multiple granularities: per-IP for individual attackers, global limits to protect overall capacity, and per-endpoint limits to prevent resource-intensive operations from overwhelming specific handlers.
class MultiLayerRateLimiter
def initialize(redis_pool)
@redis_pool = redis_pool
end
def allow?(ip:, user_id:, endpoint:)
checks = [
["ip:#{ip}", 100, 60], # 100 req/min per IP
["user:#{user_id}", 1000, 3600], # 1000 req/hour per user
["endpoint:#{endpoint}", 10000, 60], # 10000 req/min per endpoint
["global", 100000, 60] # 100k req/min globally
]
checks.all? do |key, limit, window|
@redis_pool.with do |redis|
current = redis.incr(key)
redis.expire(key, window) if current == 1
current <= limit
end
end
end
end
Credential stuffing attacks attempt to validate stolen username/password pairs through login attempts. Rate limiting login endpoints prevents automated credential testing. Stricter limits on failed login attempts specifically target this attack vector. Combining rate limiting with exponential backoff increases delay between attempts.
API key enumeration attacks probe for valid keys through systematic guessing. Rate limiting API authentication endpoints prevents rapid key testing. Limiting failed authentication attempts per IP and globally protects against distributed enumeration. Logging failed authentication attempts enables detection of enumeration patterns.
Rate limit bypass attempts exploit multiple identities or distributed infrastructure. Attackers rotate IP addresses using VPNs or botnets. Defense requires fingerprinting requests beyond IP addresses: user agents, TLS fingerprints, behavioral patterns. Rate limiting by multiple attributes simultaneously increases bypass difficulty.
Token theft and replay protection requires associating rate limits with authenticated identities rather than just API keys. Stolen API keys could be used up to rate limits before detection. Combining rate limiting with token expiration, refresh requirements, and anomaly detection improves security.
Side channel attacks infer information from rate limit responses. Response timing differences between rate-limited and allowed requests leak information about system state. Consistent response times for both allowed and rejected requests prevent timing attacks. Error messages should not reveal limit details that aid attackers.
Practical Examples
Basic API rate limiting for a Rails API restricts requests by API key with standard limits and header communication:
class Api::V1::BaseController < ApplicationController
before_action :authenticate_api_key
before_action :check_rate_limit
private
def authenticate_api_key
@api_key = ApiKey.find_by(key: request.headers['X-API-Key'])
render_unauthorized unless @api_key
end
def check_rate_limit
limiter = RedisRateLimiter.new($redis_pool, @api_key.hourly_limit, 3600)
key = "api_key:#{@api_key.id}:#{Time.now.to_i / 3600}"
remaining = limiter.remaining(key)
reset_at = limiter.reset_at(key)
response.set_header('X-RateLimit-Limit', @api_key.hourly_limit.to_s)
response.set_header('X-RateLimit-Remaining', remaining.to_s)
response.set_header('X-RateLimit-Reset', reset_at.to_s) if reset_at
unless limiter.allow?(key)
render json: {
error: 'Rate limit exceeded',
limit: @api_key.hourly_limit,
reset_at: reset_at
}, status: :too_many_requests
end
end
def render_unauthorized
render json: { error: 'Invalid API key' }, status: :unauthorized
end
end
GraphQL APIs require field-level rate limiting because clients specify which fields to query. Schema-level rate limiting counts query complexity rather than raw requests. Each field receives a complexity score, and queries exceeding total complexity limits are rejected:
class GraphqlRateLimiter
def initialize(redis_pool, max_complexity, window)
@redis_pool = redis_pool
@max_complexity = max_complexity
@window = window
end
def allow?(user_id, query_complexity)
key = "graphql:#{user_id}:#{Time.now.to_i / @window}"
@redis_pool.with do |redis|
current = redis.get(key).to_i
new_total = current + query_complexity
if new_total <= @max_complexity
redis.setex(key, @window, new_total)
true
else
false
end
end
end
end
class GraphqlController < ApplicationController
def execute
limiter = GraphqlRateLimiter.new($redis_pool, 10000, 3600)
query_complexity = calculate_complexity(params[:query])
unless limiter.allow?(current_user.id, query_complexity)
render json: {
errors: [{ message: 'Query complexity exceeds rate limit' }]
}, status: :too_many_requests
return
end
result = MySchema.execute(params[:query], context: { current_user: current_user })
render json: result
end
private
def calculate_complexity(query_string)
# Parse query and sum field complexities
query = GraphQL.parse(query_string)
ComplexityAnalyzer.new.calculate(query)
end
end
Tiered rate limiting provides different quotas based on subscription levels with automatic tier detection:
class TieredRateLimiter
TIERS = {
'free' => { hourly: 100, daily: 1000 },
'starter' => { hourly: 1000, daily: 20000 },
'professional' => { hourly: 10000, daily: 200000 },
'enterprise' => { hourly: 100000, daily: 2000000 }
}.freeze
def initialize(redis_pool)
@redis_pool = redis_pool
end
def allow?(user)
tier = TIERS[user.subscription_tier]
hourly_key = "user:#{user.id}:hour:#{Time.now.to_i / 3600}"
daily_key = "user:#{user.id}:day:#{Time.now.to_i / 86400}"
@redis_pool.with do |redis|
hourly_count = redis.incr(hourly_key)
daily_count = redis.incr(daily_key)
redis.expire(hourly_key, 3600) if hourly_count == 1
redis.expire(daily_key, 86400) if daily_count == 1
hourly_count <= tier[:hourly] && daily_count <= tier[:daily]
end
end
def quota_info(user)
tier = TIERS[user.subscription_tier]
hourly_key = "user:#{user.id}:hour:#{Time.now.to_i / 3600}"
daily_key = "user:#{user.id}:day:#{Time.now.to_i / 86400}"
@redis_pool.with do |redis|
hourly_used = redis.get(hourly_key).to_i
daily_used = redis.get(daily_key).to_i
{
tier: user.subscription_tier,
hourly: {
limit: tier[:hourly],
used: hourly_used,
remaining: tier[:hourly] - hourly_used
},
daily: {
limit: tier[:daily],
used: daily_used,
remaining: tier[:daily] - daily_used
}
}
end
end
end
WebSocket rate limiting requires different strategies than HTTP request limiting. Connections persist for extended periods, making per-connection message rate limiting necessary:
class RateLimitedWebSocket
def initialize(user, redis_pool)
@user = user
@redis_pool = redis_pool
@limiter = RedisRateLimiter.new(redis_pool, 60, 60) # 60 messages per minute
end
def on_message(message)
key = "ws:#{@user.id}:#{Time.now.to_i / 60}"
unless @limiter.allow?(key)
send_error('Rate limit exceeded. Maximum 60 messages per minute.')
return
end
process_message(message)
end
def send_error(message)
send_frame({
type: 'error',
message: message,
rate_limit: {
limit: 60,
window: 60,
retry_after: Time.now.to_i % 60
}
}.to_json)
end
def process_message(message)
# Handle valid message
end
end
Reference
Rate Limiting Algorithms Comparison
| Algorithm | Memory Usage | Accuracy | Burst Handling | Implementation Complexity |
|---|---|---|---|---|
| Fixed Window | Low | Low | Poor | Simple |
| Sliding Window | High | High | Good | Moderate |
| Sliding Window Log | Medium | High | Good | Moderate |
| Token Bucket | Low | High | Excellent | Moderate |
| Leaky Bucket | Medium | High | None | Complex |
| GCRA | Very Low | High | Good | Simple |
HTTP Response Headers
| Header | Description | Example |
|---|---|---|
| X-RateLimit-Limit | Maximum requests allowed in window | 1000 |
| X-RateLimit-Remaining | Requests remaining in current window | 247 |
| X-RateLimit-Reset | Unix timestamp when quota resets | 1678901234 |
| Retry-After | Seconds until retry allowed | 3600 |
| X-RateLimit-Used | Requests consumed in current window | 753 |
Common HTTP Status Codes
| Code | Meaning | Usage |
|---|---|---|
| 429 | Too Many Requests | Client exceeded rate limit |
| 503 | Service Unavailable | Server overloaded, apply backoff |
| 509 | Bandwidth Limit Exceeded | Data transfer quota exceeded |
Redis Commands for Rate Limiting
| Command | Purpose | Example |
|---|---|---|
| INCR | Increment counter atomically | INCR user:123:requests |
| EXPIRE | Set key expiration | EXPIRE user:123:requests 3600 |
| TTL | Get remaining time to live | TTL user:123:requests |
| GET | Retrieve current count | GET user:123:requests |
| SETEX | Set with expiration atomically | SETEX user:123:requests 3600 1 |
| INCRBY | Increment by specific amount | INCRBY user:123:cost 5 |
Configuration Parameters
| Parameter | Description | Typical Values |
|---|---|---|
| Window Size | Duration for rate calculation | 60s, 3600s, 86400s |
| Request Limit | Maximum requests per window | 100-10000 |
| Burst Capacity | Additional requests allowed in burst | 10-100 |
| Client Identifier | Attribute for tracking clients | IP, API key, user ID |
| Failure Mode | Behavior when storage unavailable | Fail open, fail closed |
| Cleanup Interval | Frequency of expired data removal | 300s-3600s |
Ruby Gems for Rate Limiting
| Gem | Algorithm Support | Storage Backend | Best For |
|---|---|---|---|
| rack-attack | Fixed window, throttle | Redis, Memcached, memory | Rails/Rack apps |
| ratelimit | Token bucket | Redis, memory | General Ruby apps |
| redis-throttle | Token bucket | Redis | Redis-based systems |
| turnstile | Custom | Redis | High-performance APIs |
| prorate | Leaky bucket | Redis | Sustained rate limiting |
Time Window Calculations
| Window Type | Calculation | Key Format |
|---|---|---|
| Per Second | timestamp / 1 | prefix:client:second:N |
| Per Minute | timestamp / 60 | prefix:client:minute:N |
| Per Hour | timestamp / 3600 | prefix:client:hour:N |
| Per Day | timestamp / 86400 | prefix:client:day:N |
| Rolling Hour | current_time - 3600 | prefix:client:rolling |
Cost Calculation Strategies
| Strategy | Description | Use Case |
|---|---|---|
| Uniform | All requests cost 1 | Simple APIs |
| Endpoint-based | Different costs per endpoint | Mixed resource usage |
| Payload-based | Cost proportional to data size | Upload/download APIs |
| Complexity-based | Cost based on computation required | GraphQL, search APIs |
| Resource-based | Cost based on resources consumed | Database queries, CPU time |