CrackedRuby - Scalability Concepts

Overview

Scalability represents a system's capacity to handle increased workload by adding resources without requiring fundamental architectural changes. A scalable system maintains acceptable performance levels, response times, and reliability when user count, data volume, or transaction rate increases by orders of magnitude.

Two primary approaches exist: vertical scaling adds resources to existing machines, while horizontal scaling distributes workload across multiple machines. The choice between these approaches affects system architecture, cost structure, and operational complexity.

Scalability intersects with performance, but differs in focus. Performance optimization improves speed for a given workload, while scalability ensures the system handles growth. A high-performance system processes requests quickly; a scalable system continues functioning effectively as demand multiplies.

Modern web applications face unpredictable scaling demands. A successful marketing campaign might increase traffic 10x overnight. A viral social media post could generate 100x normal load within hours. Systems must scale both up and down to manage costs during quiet periods.

Consider a basic web application serving 100 requests per second:

# Single-server architecture
class ApplicationServer
  def initialize
    @database = Database.connect('localhost')
    @cache = {}
  end
  
  def handle_request(request)
    user = fetch_user(request.user_id)
    data = process_business_logic(user, request.params)
    render_response(data)
  end
  
  def fetch_user(user_id)
    @cache[user_id] ||= @database.query("SELECT * FROM users WHERE id = ?", user_id)
  end
end

This architecture works until traffic increases to 1,000 requests per second. The single database connection becomes a bottleneck. The in-memory cache doesn't persist across restarts. The server CPU maxes out. Addressing these limitations requires architectural evolution, not just faster hardware.

Key Principles

Load Distribution divides work across multiple resources to prevent any single component from becoming a bottleneck. Distribution operates at multiple layers: network requests across web servers, database queries across read replicas, background jobs across worker processes, and file storage across distributed systems.

Statelessness eliminates server-side session storage, allowing any server to handle any request. User session data moves to client-side tokens or shared caching layers. Stateless services simplify horizontal scaling because adding servers requires no data migration or session transfer.

Caching stores computed results or frequently accessed data in faster storage layers. Multi-level caching includes browser cache, CDN cache, application cache, and database query cache. Cache invalidation strategies determine when cached data refreshes, balancing staleness against database load.

Asynchronous Processing moves time-consuming operations outside the request-response cycle. Users receive immediate feedback while the system processes work in background jobs. This pattern prevents slow operations from blocking request handlers and degrading user experience.

Database Partitioning splits data across multiple database instances based on partition keys. Horizontal partitioning (sharding) divides rows across databases. Vertical partitioning separates tables by access patterns. Partitioning increases capacity but introduces complexity in cross-partition queries and transactions.

Resource Pooling reuses expensive resources like database connections, thread pools, and network sockets. Connection pools maintain ready-to-use database connections, avoiding repeated connection overhead. Pool size tuning balances resource consumption against request handling capacity.

Graceful Degradation maintains partial functionality when components fail or become overloaded. The system disables non-essential features under load, displaying cached content instead of real-time data, or queuing requests instead of rejecting them.

Monitoring and Metrics track system health through quantitative measurements. Key metrics include request latency percentiles (p50, p95, p99), error rates, resource utilization, and throughput. Metrics inform scaling decisions and reveal bottlenecks before they cause outages.

The CAP theorem constrains distributed system design: systems can provide at most two of Consistency, Availability, and Partition tolerance. Most web applications choose availability and partition tolerance, accepting eventual consistency for certain data.

Amdahl's Law quantifies speedup limits in parallel systems. If 5% of work must execute serially, maximum speedup caps at 20x regardless of parallel resources added. This principle applies to databases with global locks, sequential batch processes, and coordination overhead in distributed systems.

Design Considerations

Choosing Vertical vs Horizontal Scaling

Vertical scaling costs less initially, requires no code changes, and maintains simpler operations. A single database server avoids distributed transaction complexity. One web server eliminates load balancer configuration. However, vertical scaling hits hard limits. Physical servers cap at certain CPU, memory, and I/O capacity. Downtime occurs during hardware upgrades. A single machine creates a single point of failure.

Horizontal scaling handles arbitrary growth by adding commodity hardware. Multiple servers provide redundancy. Individual server failures don't cause outages. Costs scale linearly or sublinearly through bulk purchasing. However, horizontal scaling demands architectural changes. Applications must handle distributed state, network failures, and eventual consistency. Development complexity increases through load balancer configuration, session management, and distributed debugging.

Start with vertical scaling for prototypes and low-traffic applications. Transition to horizontal scaling when vertical costs exceed horizontal complexity costs, typically around 10,000-100,000 daily active users.

Stateful vs Stateless Architecture

Stateful servers store session data in memory or local disk. Each user must route to the same server instance, implemented through sticky sessions on load balancers. This approach simplifies development for small applications but complicates scaling. Server restarts lose session data. Auto-scaling requires session migration. Load distribution becomes uneven when long-lived sessions accumulate on specific servers.

Stateless servers store no user-specific data between requests. Session information lives in signed cookies, JWT tokens, or shared caching services like Redis. Any server handles any request, enabling simple load balancing algorithms. Auto-scaling adds or removes servers without data migration concerns. However, stateless design requires shared storage for session data, adding network latency and cost.

# Stateful approach - session in memory
class StatefulController
  def login
    session[:user_id] = authenticate(params[:username], params[:password])
    # Session stored on this server instance
  end
end

# Stateless approach - JWT token
class StatelessController
  def login
    user = authenticate(params[:username], params[:password])
    token = JWT.encode({user_id: user.id, exp: 24.hours.from_now}, secret_key)
    # Client stores token, sends with each request
    render json: {token: token}
  end
end

Consistency vs Availability Trade-offs

Strong consistency guarantees all clients see the same data simultaneously. Database writes block until replicas acknowledge updates. This approach suits financial transactions, inventory management, and booking systems where stale data causes business problems. The cost includes higher latency, reduced throughput, and vulnerability to partition failures.

Eventual consistency allows temporary inconsistencies, with all replicas converging to the same state eventually. Reads return potentially stale data. Writes complete quickly without waiting for replica acknowledgment. This model suits social media feeds, analytics dashboards, and content management systems where slight delays don't impact functionality.

Choose strong consistency when correctness matters more than performance. Choose eventual consistency when availability and low latency outweigh consistency requirements. Many applications use different consistency levels for different data types: strong consistency for payment processing, eventual consistency for user profiles.

Synchronous vs Asynchronous Operations

Synchronous processing completes work within the request lifecycle. The client waits for results. This pattern suits operations completing in milliseconds: database lookups, API calls with short timeouts, simple calculations. Users expect immediate feedback for these operations.

Asynchronous processing queues work for background execution. The client receives an acknowledgment immediately. Workers process jobs outside the request thread. This pattern handles email sending, report generation, image processing, and third-party API calls with variable latency.

# Synchronous - user waits for email
def create_account
  user = User.create!(params)
  WelcomeMailer.deliver_now(user) # Blocks for 1-3 seconds
  render json: {success: true}
end

# Asynchronous - immediate response
def create_account
  user = User.create!(params)
  WelcomeEmailJob.perform_later(user.id) # Returns immediately
  render json: {success: true}
end

Move operations to background processing when they exceed 500ms, depend on external services, or can tolerate delays. Keep operations synchronous when users need immediate confirmation or subsequent actions depend on results.

Implementation Approaches

Read Replication Strategy

Read replicas distribute query load across multiple database instances. The primary database handles all writes and replicates changes to read replicas. Applications route SELECT queries to replicas, reducing primary database load. This approach scales read-heavy applications where writes represent 10-20% of database traffic.

Implementation requires connection management logic that routes queries appropriately:

class ApplicationRecord < ActiveRecord::Base
  connects_to database: {writing: :primary, reading: :replica}
  
  def self.recent_posts
    # Automatically routes to replica
    Post.where("created_at > ?", 1.week.ago).limit(100)
  end
end

Replication lag introduces eventual consistency. Changes written to primary take milliseconds to seconds to appear on replicas. Applications must handle stale data by routing critical reads to primary, implementing read-your-writes consistency for user-generated content, or displaying staleness indicators in the UI.

Monitor replication lag through database metrics. Lag exceeding 5 seconds indicates insufficient replica resources or network issues. Add more replicas to distribute load further, or scale existing replicas vertically.

Caching Layers Strategy

Multi-level caching places faster storage in front of slower storage. Each layer reduces load on downstream systems:

class PostRepository
  def initialize
    @redis = Redis.new
    @database = Database.connection
  end
  
  def find_post(post_id)
    # Level 1: Memory cache (fastest)
    return @memory_cache[post_id] if @memory_cache[post_id]
    
    # Level 2: Redis (fast)
    cached = @redis.get("post:#{post_id}")
    return deserialize(cached) if cached
    
    # Level 3: Database (slow)
    post = @database.query("SELECT * FROM posts WHERE id = ?", post_id)
    
    # Populate caches for next request
    @redis.setex("post:#{post_id}", 3600, serialize(post))
    @memory_cache[post_id] = post
    
    post
  end
end

Cache invalidation determines when cached data refreshes. Time-based expiration (TTL) sets maximum staleness. Event-based invalidation updates cache when data changes. Cache-aside pattern loads cache on miss. Write-through pattern updates cache on every write.

Cache hit rate measures caching effectiveness. Calculate as cache_hits / (cache_hits + cache_misses). Target 80-95% hit rate for frequently accessed data. Low hit rates indicate wrong data caching, TTL too short, or insufficient cache memory.

Database Sharding Strategy

Sharding partitions data across multiple database instances, each storing a subset of rows. The application determines which shard contains requested data using a partition key. Common partition keys include user_id, tenant_id, or geographic region.

Hash-based sharding applies a hash function to the partition key:

class ShardedUserRepository
  SHARD_COUNT = 4
  
  def initialize
    @shards = SHARD_COUNT.times.map { |i| Database.connect("shard_#{i}") }
  end
  
  def find_user(user_id)
    shard = determine_shard(user_id)
    shard.query("SELECT * FROM users WHERE id = ?", user_id)
  end
  
  def determine_shard(user_id)
    shard_index = user_id.hash % SHARD_COUNT
    @shards[shard_index]
  end
end

Range-based sharding assigns ranges of partition key values to shards: users 1-1M on shard 0, 1M-2M on shard 1. This approach simplifies shard rebalancing but risks uneven distribution if ranges have different access patterns.

Cross-shard queries require application-level joins. Query each shard separately and merge results in application code. Aggregate queries execute on each shard and combine results. Transactions spanning shards require distributed transaction protocols like two-phase commit, adding significant complexity.

Shard rebalancing moves data between shards when distribution becomes uneven. Online rebalancing maintains availability during migration. Plan for rebalancing complexity when designing sharded architectures.

Auto-Scaling Strategy

Auto-scaling adjusts resource count based on metrics. Horizontal auto-scaling adds or removes server instances. Vertical auto-scaling changes instance sizes. Scaling decisions balance response time against cost.

Metric-based scaling monitors CPU utilization, request queue length, or custom application metrics. Set threshold rules: scale up when CPU exceeds 70% for 5 minutes, scale down when CPU drops below 30% for 15 minutes.

Scheduled scaling anticipates predictable traffic patterns. E-commerce sites scale up before daily traffic peaks. B2B applications scale down overnight when usage drops. Schedule scaling prevents reactive lag where high load precedes scaling actions.

Predictive scaling uses machine learning to forecast demand based on historical patterns. Systems scale proactively before traffic increases. This approach suits applications with regular weekly or seasonal patterns.

class AutoScaler
  def initialize
    @cloud_api = CloudProvider.new
    @metrics = MetricsCollector.new
  end
  
  def evaluate_scaling
    current_cpu = @metrics.average_cpu_utilization(minutes: 5)
    instance_count = @cloud_api.get_instance_count
    
    if current_cpu > 70 && instance_count < 20
      @cloud_api.add_instances(2)
      log_scaling_event('scale_up', current_cpu, instance_count)
    elsif current_cpu < 30 && instance_count > 2
      @cloud_api.remove_instances(1)
      log_scaling_event('scale_down', current_cpu, instance_count)
    end
  end
end

Implement cooldown periods between scaling actions to prevent oscillation. After scaling up, wait 5-10 minutes before scaling down. Monitor scaling costs relative to infrastructure budget.

Real-World Applications

High-Traffic E-Commerce Platform

E-commerce platforms face variable load based on marketing campaigns, seasonal shopping, and viral products. Black Friday traffic might exceed average load by 50x. An architecture handling this scaling uses:

CDN serves product images, stylesheets, and JavaScript files, reducing origin server load by 80-90%
Read replicas handle product catalog queries, separating read traffic from inventory updates
Cache layer stores product details, category listings, and user session data
Async workers process order confirmation emails, inventory updates to data warehouse, and recommendation engine updates
Database sharding partitions users by user_id, keeping checkout transactions within single shards
Auto-scaling web servers based on request queue length, maintaining response times during traffic spikes

Checkout flow requires strong consistency for inventory management. The system locks inventory records during purchase, preventing overselling. Product browsing accepts eventual consistency, showing inventory counts that might be several seconds stale.

class ScalableCheckoutService
  def process_order(user_id, items)
    shard = determine_shard(user_id)
    
    shard.transaction do
      # Strong consistency for inventory
      items.each do |item|
        inventory = shard.lock("SELECT * FROM inventory WHERE product_id = ? FOR UPDATE", item[:product_id])
        raise InsufficientInventory if inventory.quantity < item[:quantity]
        
        inventory.quantity -= item[:quantity]
        inventory.save
      end
      
      order = Order.create(user_id: user_id, items: items)
      
      # Async processing for non-critical operations
      OrderConfirmationJob.perform_later(order.id)
      InventoryWarehouseUpdateJob.perform_later(items)
    end
  end
end

The platform monitors checkout latency at p95 and p99 percentiles. Latency spikes trigger investigation into database slow queries, cache hit rate degradation, or insufficient worker capacity.

Social Media Feed Service

Social media feeds serve personalized content to millions of concurrent users. Each user follows hundreds or thousands of other users, generating billions of posts daily. Scaling this workload requires:

Fanout-on-write distributes new posts to follower feeds during posting, not during feed reads
Timeline cache stores pre-computed feeds in Redis, serving reads without database queries
Feed ranking happens asynchronously, updating cached feeds periodically
Image CDN distributes uploaded images geographically, reducing latency for global users
Sharded database stores posts by author_id, colocating each user's posts on one shard
Message queue handles millions of fanout tasks per second during viral posts

The system accepts eventual consistency throughout. New posts might take seconds to appear in follower feeds. Like counts and comments update asynchronously. Users tolerate brief delays for social interactions.

class FeedService
  def publish_post(user_id, content)
    post = Post.create(author_id: user_id, content: content)
    
    # Fanout to followers asynchronously
    FanoutPostJob.perform_later(post.id)
    
    {post_id: post.id, status: 'publishing'}
  end
end

class FanoutPostJob
  def perform(post_id)
    post = Post.find(post_id)
    follower_ids = Follower.where(following_id: post.author_id).pluck(:follower_id)
    
    # Batch updates to timeline cache
    follower_ids.each_slice(1000) do |batch|
      redis.pipelined do
        batch.each do |follower_id|
          redis.lpush("timeline:#{follower_id}", post.id)
          redis.ltrim("timeline:#{follower_id}", 0, 999) # Keep 1000 most recent
        end
      end
    end
  end
end

Celebrity accounts with millions of followers use different fanout strategies. Instead of fanout-on-write, the system uses fanout-on-read or hybrid approaches, computing celebrity post visibility during timeline generation rather than pushing to all followers.

Real-Time Analytics Dashboard

Analytics dashboards aggregate data from millions of events per second, displaying metrics with second-level latency. Scaling requires specialized data stores and processing patterns:

Stream processing ingests events into Kafka topics, buffering spikes and enabling replay
Time-series database stores aggregated metrics optimized for temporal queries
Pre-aggregation computes common metrics during ingestion, avoiding query-time aggregation
Materialized views cache complex aggregations, refreshing periodically
Partitioning by time enables dropping old partitions without expensive deletions
Query result cache serves repeated dashboard queries without recomputation

The architecture separates hot and cold data paths. Recent data (last 24 hours) remains in memory-optimized storage for fast queries. Historical data (older than 7 days) moves to compressed columnar storage, accepting slower query times for reduced costs.

class EventProcessor
  def process_event(event)
    # Write to stream for replay capability
    @kafka.produce(event.to_json, topic: 'events')
    
    # Aggregate in memory, flush periodically
    @buffer.increment(event.metric_name, event.value, event.timestamp)
    
    flush_buffer if @buffer.size > 10_000
  end
  
  def flush_buffer
    @buffer.each do |metric_name, value, timestamp|
      # Write pre-aggregated data
      @timeseries_db.write(
        metric: metric_name,
        value: value,
        timestamp: timestamp,
        tags: {environment: 'production'}
      )
    end
    @buffer.clear
  end
end

Query optimization uses time-range indexes and metric tag indexes. Dashboard queries specify time ranges explicitly: "last 1 hour", "yesterday", "last 30 days". The database prunes search space based on time partitions before scanning detailed metrics.

Performance Considerations

Latency vs Throughput Trade-offs

Latency measures time to complete a single request. Throughput measures requests completed per second. Optimizing for one often degrades the other. Batching operations increases throughput by amortizing overhead but increases latency for requests waiting in batches.

Database connection pooling illustrates this trade-off. Small pools reduce resource consumption but increase queue wait time during traffic spikes. Large pools handle spikes better but waste memory and database connections during quiet periods.

class ConnectionPooling
  # Small pool: lower latency at low load, higher latency at high load
  SMALL_POOL = ConnectionPool.new(size: 5, timeout: 5)
  
  # Large pool: consistent latency, higher resource usage
  LARGE_POOL = ConnectionPool.new(size: 25, timeout: 5)
  
  def query_with_small_pool
    SMALL_POOL.with do |connection|
      connection.query("SELECT * FROM users LIMIT 100")
    end
  end
end

Measure both average latency and latency percentiles. The p50 (median) latency represents typical performance. The p95 and p99 latencies reveal tail performance impacting worst-case user experience. Optimize for p99 latency when consistency matters more than average performance.

Database Query Optimization

Query performance degrades non-linearly with data volume. A query scanning 1,000 rows might complete in 10ms. The same query scanning 1,000,000 rows might take 10 seconds, not 10,000ms. Indexes, query planning, and database statistics determine actual scaling behavior.

Index design balances read and write performance. Each index speeds up queries using indexed columns but slows down inserts and updates that maintain index structures. Applications with 90% reads benefit from aggressive indexing. Write-heavy applications minimize indexes.

class QueryOptimization
  # Unoptimized: scans full table
  def find_active_users_slow
    User.where("last_login_at > ? AND status = 'active'", 30.days.ago).to_a
  end
  
  # Optimized: uses composite index
  def find_active_users_fast
    # Requires index on (status, last_login_at)
    User.where(status: 'active')
        .where("last_login_at > ?", 30.days.ago)
        .to_a
  end
  
  # Further optimized: selective columns
  def find_active_user_ids
    # Avoids loading full objects
    User.where(status: 'active')
        .where("last_login_at > ?", 30.days.ago)
        .pluck(:id)
  end
end

Query analysis tools reveal execution plans, showing which indexes the database uses and where full table scans occur. PostgreSQL's EXPLAIN ANALYZE provides timing for each query stage. MySQL's EXPLAIN shows index usage and row counts examined.

N+1 query problems multiply database round trips. Loading 100 posts and then loading authors individually generates 101 queries. Eager loading fetches associations in 2 queries through joins or separate batched queries.

Caching Strategy Performance

Cache hit rate determines caching effectiveness. A 90% hit rate means 90% of requests avoid slower backing stores. A 50% hit rate provides minimal benefit because half of requests still incur full latency.

Cache size affects hit rate non-linearly. Increasing cache size from 1GB to 2GB might increase hit rate from 80% to 85%. Increasing from 2GB to 4GB might only improve hit rate to 87%. Analyze working set size to determine optimal cache sizing.

TTL configuration balances staleness against database load. Short TTL (seconds) keeps data fresh but requires frequent cache refreshes. Long TTL (hours) reduces database load but increases staleness. Vary TTL by data type: user profiles cache for hours, flash sale inventory caches for seconds.

class CacheStrategy
  def initialize
    @cache = Redis.new
  end
  
  # High-value data: long TTL
  def get_user_profile(user_id)
    @cache.fetch("user:profile:#{user_id}", expires_in: 1.hour) do
      database.query("SELECT * FROM users WHERE id = ?", user_id)
    end
  end
  
  # Volatile data: short TTL
  def get_inventory_count(product_id)
    @cache.fetch("inventory:#{product_id}", expires_in: 10.seconds) do
      database.query("SELECT quantity FROM inventory WHERE product_id = ?", product_id)
    end
  end
  
  # Critical data: no caching
  def get_account_balance(user_id)
    database.query("SELECT balance FROM accounts WHERE user_id = ?", user_id)
  end
end

Cache warming prevents cold start problems where empty caches cause database thundering herds. Pre-populate caches with frequently accessed data during deployment or startup. Background jobs refresh popular cache entries before expiration.

Asynchronous Processing Performance

Background job systems introduce latency between action and execution. Users might wait seconds or minutes for async operations to complete. Queue depth and worker count determine this latency.

Queue depth measures jobs waiting for processing. Growing queue depth indicates insufficient workers or jobs taking longer than expected. Monitor queue depth per job type, as different jobs have different resource requirements.

Worker count scales with job concurrency requirements. CPU-bound jobs benefit from one worker per CPU core. I/O-bound jobs handling network requests benefit from more workers than cores, utilizing I/O wait time. Database-heavy jobs require fewer workers to avoid overwhelming database connections.

class BackgroundJobPerformance
  # CPU-bound job: image processing
  class ImageProcessingJob
    def perform(image_id)
      image = Image.find(image_id)
      processed = resize_and_optimize(image.data) # CPU intensive
      image.update(processed_data: processed)
    end
  end
  
  # I/O-bound job: API calls
  class WebhookNotificationJob
    def perform(webhook_url, payload)
      # Mostly waiting for network
      HTTP.post(webhook_url, json: payload)
    end
  end
  
  # Configure workers based on job type
  # Sidekiq.configure do |config|
  #   config.concurrency = 10 # For I/O-bound jobs
  # end
end

Job retry logic handles transient failures but can amplify problems. Exponential backoff spaces retry attempts: retry after 1 minute, 5 minutes, 25 minutes. Dead letter queues capture jobs failing after maximum retries for manual investigation.

Monitor job processing time percentiles. Jobs consistently exceeding expected duration indicate code performance problems or resource constraints. Slow jobs block workers from processing other queued jobs.

Reference

Scaling Dimensions

Dimension	Description	Example
Vertical	Add resources to existing machines	Increase server RAM from 16GB to 64GB
Horizontal	Add more machines	Scale from 2 to 10 web servers
Read Scaling	Add read replicas	1 primary database, 5 read replicas
Write Scaling	Partition write operations	Shard database by user_id
Geographic	Distribute across regions	Servers in US, EU, Asia data centers

Common Bottlenecks

Component	Symptoms	Solutions
Database	Slow queries, high CPU	Add indexes, read replicas, caching
Memory	High swap usage, OOM errors	Increase RAM, optimize memory usage
Network	High latency, timeouts	CDN, load balancer, geographic distribution
Disk I/O	Slow writes, queue buildup	SSD storage, write batching, async writes
CPU	High utilization, slow processing	Optimize algorithms, horizontal scaling
Connection Pool	Connection wait timeouts	Increase pool size, optimize query speed

Caching Strategies

Strategy	Write Behavior	Read Behavior	Use Case
Cache-Aside	Application updates cache	Application checks cache first	General purpose, read-heavy
Write-Through	Update cache with database	Read from cache	Strong consistency needs
Write-Behind	Update cache, async database	Read from cache	High write throughput
Refresh-Ahead	Async background refresh	Read from cache	Predictable access patterns
Time-Based	Independent of writes	Expires after TTL	Acceptable staleness

Database Replication Patterns

Pattern	Consistency	Latency	Complexity
Synchronous	Strong	High write latency	Low
Asynchronous	Eventual	Low write latency	Medium
Semi-Synchronous	Configurable	Medium write latency	Medium
Multi-Primary	Conflict resolution required	Low latency	High

Load Balancing Algorithms

Algorithm	Distribution	Use Case
Round Robin	Equal distribution	Homogeneous servers
Least Connections	Based on active connections	Variable request duration
Weighted Round Robin	Proportional to weights	Heterogeneous server capacity
IP Hash	Same client to same server	Sticky sessions needed
Least Response Time	Based on health checks	Optimizing latency

Monitoring Metrics

Metric	Target	Action Threshold
CPU Utilization	50-70% average	>80% sustained
Memory Usage	60-80%	>90%
Request Latency p95	<200ms	>500ms
Request Latency p99	<500ms	>1000ms
Error Rate	<0.1%	>1%
Cache Hit Rate	>80%	<70%
Database Connection Pool	60-80% used	>90% used
Queue Depth	<100 jobs	>1000 jobs

Database Sharding Approaches

Approach	Key Distribution	Query Complexity	Rebalancing
Hash-Based	Hash function	Simple single-shard	Difficult
Range-Based	Key ranges	Simple single-shard	Moderate
Geographic	Location	Simple regional	Moderate
Directory-Based	Lookup table	Complex	Easy
Composite	Multiple keys	Complex	Difficult

Consistency Models

Model	Guarantee	Performance	Use Case
Strong	All reads see latest write	Lower throughput	Financial transactions
Eventual	Reads converge to latest	High throughput	Social media feeds
Causal	Causally related operations ordered	Medium throughput	Chat applications
Read-Your-Writes	Client sees own writes	Medium throughput	User profiles
Monotonic Reads	No going backwards	High throughput	Analytics dashboards

Scalability Concepts