Overview
Scalability represents a system's capacity to handle increased workload by adding resources without requiring fundamental architectural changes. A scalable system maintains acceptable performance levels, response times, and reliability when user count, data volume, or transaction rate increases by orders of magnitude.
Two primary approaches exist: vertical scaling adds resources to existing machines, while horizontal scaling distributes workload across multiple machines. The choice between these approaches affects system architecture, cost structure, and operational complexity.
Scalability intersects with performance, but differs in focus. Performance optimization improves speed for a given workload, while scalability ensures the system handles growth. A high-performance system processes requests quickly; a scalable system continues functioning effectively as demand multiplies.
Modern web applications face unpredictable scaling demands. A successful marketing campaign might increase traffic 10x overnight. A viral social media post could generate 100x normal load within hours. Systems must scale both up and down to manage costs during quiet periods.
Consider a basic web application serving 100 requests per second:
# Single-server architecture
class ApplicationServer
def initialize
@database = Database.connect('localhost')
@cache = {}
end
def handle_request(request)
user = fetch_user(request.user_id)
data = process_business_logic(user, request.params)
render_response(data)
end
def fetch_user(user_id)
@cache[user_id] ||= @database.query("SELECT * FROM users WHERE id = ?", user_id)
end
end
This architecture works until traffic increases to 1,000 requests per second. The single database connection becomes a bottleneck. The in-memory cache doesn't persist across restarts. The server CPU maxes out. Addressing these limitations requires architectural evolution, not just faster hardware.
Key Principles
Load Distribution divides work across multiple resources to prevent any single component from becoming a bottleneck. Distribution operates at multiple layers: network requests across web servers, database queries across read replicas, background jobs across worker processes, and file storage across distributed systems.
Statelessness eliminates server-side session storage, allowing any server to handle any request. User session data moves to client-side tokens or shared caching layers. Stateless services simplify horizontal scaling because adding servers requires no data migration or session transfer.
Caching stores computed results or frequently accessed data in faster storage layers. Multi-level caching includes browser cache, CDN cache, application cache, and database query cache. Cache invalidation strategies determine when cached data refreshes, balancing staleness against database load.
Asynchronous Processing moves time-consuming operations outside the request-response cycle. Users receive immediate feedback while the system processes work in background jobs. This pattern prevents slow operations from blocking request handlers and degrading user experience.
Database Partitioning splits data across multiple database instances based on partition keys. Horizontal partitioning (sharding) divides rows across databases. Vertical partitioning separates tables by access patterns. Partitioning increases capacity but introduces complexity in cross-partition queries and transactions.
Resource Pooling reuses expensive resources like database connections, thread pools, and network sockets. Connection pools maintain ready-to-use database connections, avoiding repeated connection overhead. Pool size tuning balances resource consumption against request handling capacity.
Graceful Degradation maintains partial functionality when components fail or become overloaded. The system disables non-essential features under load, displaying cached content instead of real-time data, or queuing requests instead of rejecting them.
Monitoring and Metrics track system health through quantitative measurements. Key metrics include request latency percentiles (p50, p95, p99), error rates, resource utilization, and throughput. Metrics inform scaling decisions and reveal bottlenecks before they cause outages.
The CAP theorem constrains distributed system design: systems can provide at most two of Consistency, Availability, and Partition tolerance. Most web applications choose availability and partition tolerance, accepting eventual consistency for certain data.
Amdahl's Law quantifies speedup limits in parallel systems. If 5% of work must execute serially, maximum speedup caps at 20x regardless of parallel resources added. This principle applies to databases with global locks, sequential batch processes, and coordination overhead in distributed systems.
Design Considerations
Choosing Vertical vs Horizontal Scaling
Vertical scaling costs less initially, requires no code changes, and maintains simpler operations. A single database server avoids distributed transaction complexity. One web server eliminates load balancer configuration. However, vertical scaling hits hard limits. Physical servers cap at certain CPU, memory, and I/O capacity. Downtime occurs during hardware upgrades. A single machine creates a single point of failure.
Horizontal scaling handles arbitrary growth by adding commodity hardware. Multiple servers provide redundancy. Individual server failures don't cause outages. Costs scale linearly or sublinearly through bulk purchasing. However, horizontal scaling demands architectural changes. Applications must handle distributed state, network failures, and eventual consistency. Development complexity increases through load balancer configuration, session management, and distributed debugging.
Start with vertical scaling for prototypes and low-traffic applications. Transition to horizontal scaling when vertical costs exceed horizontal complexity costs, typically around 10,000-100,000 daily active users.
Stateful vs Stateless Architecture
Stateful servers store session data in memory or local disk. Each user must route to the same server instance, implemented through sticky sessions on load balancers. This approach simplifies development for small applications but complicates scaling. Server restarts lose session data. Auto-scaling requires session migration. Load distribution becomes uneven when long-lived sessions accumulate on specific servers.
Stateless servers store no user-specific data between requests. Session information lives in signed cookies, JWT tokens, or shared caching services like Redis. Any server handles any request, enabling simple load balancing algorithms. Auto-scaling adds or removes servers without data migration concerns. However, stateless design requires shared storage for session data, adding network latency and cost.
# Stateful approach - session in memory
class StatefulController
def login
session[:user_id] = authenticate(params[:username], params[:password])
# Session stored on this server instance
end
end
# Stateless approach - JWT token
class StatelessController
def login
user = authenticate(params[:username], params[:password])
token = JWT.encode({user_id: user.id, exp: 24.hours.from_now}, secret_key)
# Client stores token, sends with each request
render json: {token: token}
end
end
Consistency vs Availability Trade-offs
Strong consistency guarantees all clients see the same data simultaneously. Database writes block until replicas acknowledge updates. This approach suits financial transactions, inventory management, and booking systems where stale data causes business problems. The cost includes higher latency, reduced throughput, and vulnerability to partition failures.
Eventual consistency allows temporary inconsistencies, with all replicas converging to the same state eventually. Reads return potentially stale data. Writes complete quickly without waiting for replica acknowledgment. This model suits social media feeds, analytics dashboards, and content management systems where slight delays don't impact functionality.
Choose strong consistency when correctness matters more than performance. Choose eventual consistency when availability and low latency outweigh consistency requirements. Many applications use different consistency levels for different data types: strong consistency for payment processing, eventual consistency for user profiles.
Synchronous vs Asynchronous Operations
Synchronous processing completes work within the request lifecycle. The client waits for results. This pattern suits operations completing in milliseconds: database lookups, API calls with short timeouts, simple calculations. Users expect immediate feedback for these operations.
Asynchronous processing queues work for background execution. The client receives an acknowledgment immediately. Workers process jobs outside the request thread. This pattern handles email sending, report generation, image processing, and third-party API calls with variable latency.
# Synchronous - user waits for email
def create_account
user = User.create!(params)
WelcomeMailer.deliver_now(user) # Blocks for 1-3 seconds
render json: {success: true}
end
# Asynchronous - immediate response
def create_account
user = User.create!(params)
WelcomeEmailJob.perform_later(user.id) # Returns immediately
render json: {success: true}
end
Move operations to background processing when they exceed 500ms, depend on external services, or can tolerate delays. Keep operations synchronous when users need immediate confirmation or subsequent actions depend on results.
Implementation Approaches
Read Replication Strategy
Read replicas distribute query load across multiple database instances. The primary database handles all writes and replicates changes to read replicas. Applications route SELECT queries to replicas, reducing primary database load. This approach scales read-heavy applications where writes represent 10-20% of database traffic.
Implementation requires connection management logic that routes queries appropriately:
class ApplicationRecord < ActiveRecord::Base
connects_to database: {writing: :primary, reading: :replica}
def self.recent_posts
# Automatically routes to replica
Post.where("created_at > ?", 1.week.ago).limit(100)
end
end
Replication lag introduces eventual consistency. Changes written to primary take milliseconds to seconds to appear on replicas. Applications must handle stale data by routing critical reads to primary, implementing read-your-writes consistency for user-generated content, or displaying staleness indicators in the UI.
Monitor replication lag through database metrics. Lag exceeding 5 seconds indicates insufficient replica resources or network issues. Add more replicas to distribute load further, or scale existing replicas vertically.
Caching Layers Strategy
Multi-level caching places faster storage in front of slower storage. Each layer reduces load on downstream systems:
class PostRepository
def initialize
@redis = Redis.new
@database = Database.connection
end
def find_post(post_id)
# Level 1: Memory cache (fastest)
return @memory_cache[post_id] if @memory_cache[post_id]
# Level 2: Redis (fast)
cached = @redis.get("post:#{post_id}")
return deserialize(cached) if cached
# Level 3: Database (slow)
post = @database.query("SELECT * FROM posts WHERE id = ?", post_id)
# Populate caches for next request
@redis.setex("post:#{post_id}", 3600, serialize(post))
@memory_cache[post_id] = post
post
end
end
Cache invalidation determines when cached data refreshes. Time-based expiration (TTL) sets maximum staleness. Event-based invalidation updates cache when data changes. Cache-aside pattern loads cache on miss. Write-through pattern updates cache on every write.
Cache hit rate measures caching effectiveness. Calculate as cache_hits / (cache_hits + cache_misses). Target 80-95% hit rate for frequently accessed data. Low hit rates indicate wrong data caching, TTL too short, or insufficient cache memory.
Database Sharding Strategy
Sharding partitions data across multiple database instances, each storing a subset of rows. The application determines which shard contains requested data using a partition key. Common partition keys include user_id, tenant_id, or geographic region.
Hash-based sharding applies a hash function to the partition key:
class ShardedUserRepository
SHARD_COUNT = 4
def initialize
@shards = SHARD_COUNT.times.map { |i| Database.connect("shard_#{i}") }
end
def find_user(user_id)
shard = determine_shard(user_id)
shard.query("SELECT * FROM users WHERE id = ?", user_id)
end
def determine_shard(user_id)
shard_index = user_id.hash % SHARD_COUNT
@shards[shard_index]
end
end
Range-based sharding assigns ranges of partition key values to shards: users 1-1M on shard 0, 1M-2M on shard 1. This approach simplifies shard rebalancing but risks uneven distribution if ranges have different access patterns.
Cross-shard queries require application-level joins. Query each shard separately and merge results in application code. Aggregate queries execute on each shard and combine results. Transactions spanning shards require distributed transaction protocols like two-phase commit, adding significant complexity.
Shard rebalancing moves data between shards when distribution becomes uneven. Online rebalancing maintains availability during migration. Plan for rebalancing complexity when designing sharded architectures.
Auto-Scaling Strategy
Auto-scaling adjusts resource count based on metrics. Horizontal auto-scaling adds or removes server instances. Vertical auto-scaling changes instance sizes. Scaling decisions balance response time against cost.
Metric-based scaling monitors CPU utilization, request queue length, or custom application metrics. Set threshold rules: scale up when CPU exceeds 70% for 5 minutes, scale down when CPU drops below 30% for 15 minutes.
Scheduled scaling anticipates predictable traffic patterns. E-commerce sites scale up before daily traffic peaks. B2B applications scale down overnight when usage drops. Schedule scaling prevents reactive lag where high load precedes scaling actions.
Predictive scaling uses machine learning to forecast demand based on historical patterns. Systems scale proactively before traffic increases. This approach suits applications with regular weekly or seasonal patterns.
class AutoScaler
def initialize
@cloud_api = CloudProvider.new
@metrics = MetricsCollector.new
end
def evaluate_scaling
current_cpu = @metrics.average_cpu_utilization(minutes: 5)
instance_count = @cloud_api.get_instance_count
if current_cpu > 70 && instance_count < 20
@cloud_api.add_instances(2)
log_scaling_event('scale_up', current_cpu, instance_count)
elsif current_cpu < 30 && instance_count > 2
@cloud_api.remove_instances(1)
log_scaling_event('scale_down', current_cpu, instance_count)
end
end
end
Implement cooldown periods between scaling actions to prevent oscillation. After scaling up, wait 5-10 minutes before scaling down. Monitor scaling costs relative to infrastructure budget.
Real-World Applications
High-Traffic E-Commerce Platform
E-commerce platforms face variable load based on marketing campaigns, seasonal shopping, and viral products. Black Friday traffic might exceed average load by 50x. An architecture handling this scaling uses:
- CDN serves product images, stylesheets, and JavaScript files, reducing origin server load by 80-90%
- Read replicas handle product catalog queries, separating read traffic from inventory updates
- Cache layer stores product details, category listings, and user session data
- Async workers process order confirmation emails, inventory updates to data warehouse, and recommendation engine updates
- Database sharding partitions users by user_id, keeping checkout transactions within single shards
- Auto-scaling web servers based on request queue length, maintaining response times during traffic spikes
Checkout flow requires strong consistency for inventory management. The system locks inventory records during purchase, preventing overselling. Product browsing accepts eventual consistency, showing inventory counts that might be several seconds stale.
class ScalableCheckoutService
def process_order(user_id, items)
shard = determine_shard(user_id)
shard.transaction do
# Strong consistency for inventory
items.each do |item|
inventory = shard.lock("SELECT * FROM inventory WHERE product_id = ? FOR UPDATE", item[:product_id])
raise InsufficientInventory if inventory.quantity < item[:quantity]
inventory.quantity -= item[:quantity]
inventory.save
end
order = Order.create(user_id: user_id, items: items)
# Async processing for non-critical operations
OrderConfirmationJob.perform_later(order.id)
InventoryWarehouseUpdateJob.perform_later(items)
end
end
end
The platform monitors checkout latency at p95 and p99 percentiles. Latency spikes trigger investigation into database slow queries, cache hit rate degradation, or insufficient worker capacity.
Social Media Feed Service
Social media feeds serve personalized content to millions of concurrent users. Each user follows hundreds or thousands of other users, generating billions of posts daily. Scaling this workload requires:
- Fanout-on-write distributes new posts to follower feeds during posting, not during feed reads
- Timeline cache stores pre-computed feeds in Redis, serving reads without database queries
- Feed ranking happens asynchronously, updating cached feeds periodically
- Image CDN distributes uploaded images geographically, reducing latency for global users
- Sharded database stores posts by author_id, colocating each user's posts on one shard
- Message queue handles millions of fanout tasks per second during viral posts
The system accepts eventual consistency throughout. New posts might take seconds to appear in follower feeds. Like counts and comments update asynchronously. Users tolerate brief delays for social interactions.
class FeedService
def publish_post(user_id, content)
post = Post.create(author_id: user_id, content: content)
# Fanout to followers asynchronously
FanoutPostJob.perform_later(post.id)
{post_id: post.id, status: 'publishing'}
end
end
class FanoutPostJob
def perform(post_id)
post = Post.find(post_id)
follower_ids = Follower.where(following_id: post.author_id).pluck(:follower_id)
# Batch updates to timeline cache
follower_ids.each_slice(1000) do |batch|
redis.pipelined do
batch.each do |follower_id|
redis.lpush("timeline:#{follower_id}", post.id)
redis.ltrim("timeline:#{follower_id}", 0, 999) # Keep 1000 most recent
end
end
end
end
end
Celebrity accounts with millions of followers use different fanout strategies. Instead of fanout-on-write, the system uses fanout-on-read or hybrid approaches, computing celebrity post visibility during timeline generation rather than pushing to all followers.
Real-Time Analytics Dashboard
Analytics dashboards aggregate data from millions of events per second, displaying metrics with second-level latency. Scaling requires specialized data stores and processing patterns:
- Stream processing ingests events into Kafka topics, buffering spikes and enabling replay
- Time-series database stores aggregated metrics optimized for temporal queries
- Pre-aggregation computes common metrics during ingestion, avoiding query-time aggregation
- Materialized views cache complex aggregations, refreshing periodically
- Partitioning by time enables dropping old partitions without expensive deletions
- Query result cache serves repeated dashboard queries without recomputation
The architecture separates hot and cold data paths. Recent data (last 24 hours) remains in memory-optimized storage for fast queries. Historical data (older than 7 days) moves to compressed columnar storage, accepting slower query times for reduced costs.
class EventProcessor
def process_event(event)
# Write to stream for replay capability
@kafka.produce(event.to_json, topic: 'events')
# Aggregate in memory, flush periodically
@buffer.increment(event.metric_name, event.value, event.timestamp)
flush_buffer if @buffer.size > 10_000
end
def flush_buffer
@buffer.each do |metric_name, value, timestamp|
# Write pre-aggregated data
@timeseries_db.write(
metric: metric_name,
value: value,
timestamp: timestamp,
tags: {environment: 'production'}
)
end
@buffer.clear
end
end
Query optimization uses time-range indexes and metric tag indexes. Dashboard queries specify time ranges explicitly: "last 1 hour", "yesterday", "last 30 days". The database prunes search space based on time partitions before scanning detailed metrics.
Performance Considerations
Latency vs Throughput Trade-offs
Latency measures time to complete a single request. Throughput measures requests completed per second. Optimizing for one often degrades the other. Batching operations increases throughput by amortizing overhead but increases latency for requests waiting in batches.
Database connection pooling illustrates this trade-off. Small pools reduce resource consumption but increase queue wait time during traffic spikes. Large pools handle spikes better but waste memory and database connections during quiet periods.
class ConnectionPooling
# Small pool: lower latency at low load, higher latency at high load
SMALL_POOL = ConnectionPool.new(size: 5, timeout: 5)
# Large pool: consistent latency, higher resource usage
LARGE_POOL = ConnectionPool.new(size: 25, timeout: 5)
def query_with_small_pool
SMALL_POOL.with do |connection|
connection.query("SELECT * FROM users LIMIT 100")
end
end
end
Measure both average latency and latency percentiles. The p50 (median) latency represents typical performance. The p95 and p99 latencies reveal tail performance impacting worst-case user experience. Optimize for p99 latency when consistency matters more than average performance.
Database Query Optimization
Query performance degrades non-linearly with data volume. A query scanning 1,000 rows might complete in 10ms. The same query scanning 1,000,000 rows might take 10 seconds, not 10,000ms. Indexes, query planning, and database statistics determine actual scaling behavior.
Index design balances read and write performance. Each index speeds up queries using indexed columns but slows down inserts and updates that maintain index structures. Applications with 90% reads benefit from aggressive indexing. Write-heavy applications minimize indexes.
class QueryOptimization
# Unoptimized: scans full table
def find_active_users_slow
User.where("last_login_at > ? AND status = 'active'", 30.days.ago).to_a
end
# Optimized: uses composite index
def find_active_users_fast
# Requires index on (status, last_login_at)
User.where(status: 'active')
.where("last_login_at > ?", 30.days.ago)
.to_a
end
# Further optimized: selective columns
def find_active_user_ids
# Avoids loading full objects
User.where(status: 'active')
.where("last_login_at > ?", 30.days.ago)
.pluck(:id)
end
end
Query analysis tools reveal execution plans, showing which indexes the database uses and where full table scans occur. PostgreSQL's EXPLAIN ANALYZE provides timing for each query stage. MySQL's EXPLAIN shows index usage and row counts examined.
N+1 query problems multiply database round trips. Loading 100 posts and then loading authors individually generates 101 queries. Eager loading fetches associations in 2 queries through joins or separate batched queries.
Caching Strategy Performance
Cache hit rate determines caching effectiveness. A 90% hit rate means 90% of requests avoid slower backing stores. A 50% hit rate provides minimal benefit because half of requests still incur full latency.
Cache size affects hit rate non-linearly. Increasing cache size from 1GB to 2GB might increase hit rate from 80% to 85%. Increasing from 2GB to 4GB might only improve hit rate to 87%. Analyze working set size to determine optimal cache sizing.
TTL configuration balances staleness against database load. Short TTL (seconds) keeps data fresh but requires frequent cache refreshes. Long TTL (hours) reduces database load but increases staleness. Vary TTL by data type: user profiles cache for hours, flash sale inventory caches for seconds.
class CacheStrategy
def initialize
@cache = Redis.new
end
# High-value data: long TTL
def get_user_profile(user_id)
@cache.fetch("user:profile:#{user_id}", expires_in: 1.hour) do
database.query("SELECT * FROM users WHERE id = ?", user_id)
end
end
# Volatile data: short TTL
def get_inventory_count(product_id)
@cache.fetch("inventory:#{product_id}", expires_in: 10.seconds) do
database.query("SELECT quantity FROM inventory WHERE product_id = ?", product_id)
end
end
# Critical data: no caching
def get_account_balance(user_id)
database.query("SELECT balance FROM accounts WHERE user_id = ?", user_id)
end
end
Cache warming prevents cold start problems where empty caches cause database thundering herds. Pre-populate caches with frequently accessed data during deployment or startup. Background jobs refresh popular cache entries before expiration.
Asynchronous Processing Performance
Background job systems introduce latency between action and execution. Users might wait seconds or minutes for async operations to complete. Queue depth and worker count determine this latency.
Queue depth measures jobs waiting for processing. Growing queue depth indicates insufficient workers or jobs taking longer than expected. Monitor queue depth per job type, as different jobs have different resource requirements.
Worker count scales with job concurrency requirements. CPU-bound jobs benefit from one worker per CPU core. I/O-bound jobs handling network requests benefit from more workers than cores, utilizing I/O wait time. Database-heavy jobs require fewer workers to avoid overwhelming database connections.
class BackgroundJobPerformance
# CPU-bound job: image processing
class ImageProcessingJob
def perform(image_id)
image = Image.find(image_id)
processed = resize_and_optimize(image.data) # CPU intensive
image.update(processed_data: processed)
end
end
# I/O-bound job: API calls
class WebhookNotificationJob
def perform(webhook_url, payload)
# Mostly waiting for network
HTTP.post(webhook_url, json: payload)
end
end
# Configure workers based on job type
# Sidekiq.configure do |config|
# config.concurrency = 10 # For I/O-bound jobs
# end
end
Job retry logic handles transient failures but can amplify problems. Exponential backoff spaces retry attempts: retry after 1 minute, 5 minutes, 25 minutes. Dead letter queues capture jobs failing after maximum retries for manual investigation.
Monitor job processing time percentiles. Jobs consistently exceeding expected duration indicate code performance problems or resource constraints. Slow jobs block workers from processing other queued jobs.
Reference
Scaling Dimensions
| Dimension | Description | Example |
|---|---|---|
| Vertical | Add resources to existing machines | Increase server RAM from 16GB to 64GB |
| Horizontal | Add more machines | Scale from 2 to 10 web servers |
| Read Scaling | Add read replicas | 1 primary database, 5 read replicas |
| Write Scaling | Partition write operations | Shard database by user_id |
| Geographic | Distribute across regions | Servers in US, EU, Asia data centers |
Common Bottlenecks
| Component | Symptoms | Solutions |
|---|---|---|
| Database | Slow queries, high CPU | Add indexes, read replicas, caching |
| Memory | High swap usage, OOM errors | Increase RAM, optimize memory usage |
| Network | High latency, timeouts | CDN, load balancer, geographic distribution |
| Disk I/O | Slow writes, queue buildup | SSD storage, write batching, async writes |
| CPU | High utilization, slow processing | Optimize algorithms, horizontal scaling |
| Connection Pool | Connection wait timeouts | Increase pool size, optimize query speed |
Caching Strategies
| Strategy | Write Behavior | Read Behavior | Use Case |
|---|---|---|---|
| Cache-Aside | Application updates cache | Application checks cache first | General purpose, read-heavy |
| Write-Through | Update cache with database | Read from cache | Strong consistency needs |
| Write-Behind | Update cache, async database | Read from cache | High write throughput |
| Refresh-Ahead | Async background refresh | Read from cache | Predictable access patterns |
| Time-Based | Independent of writes | Expires after TTL | Acceptable staleness |
Database Replication Patterns
| Pattern | Consistency | Latency | Complexity |
|---|---|---|---|
| Synchronous | Strong | High write latency | Low |
| Asynchronous | Eventual | Low write latency | Medium |
| Semi-Synchronous | Configurable | Medium write latency | Medium |
| Multi-Primary | Conflict resolution required | Low latency | High |
Load Balancing Algorithms
| Algorithm | Distribution | Use Case |
|---|---|---|
| Round Robin | Equal distribution | Homogeneous servers |
| Least Connections | Based on active connections | Variable request duration |
| Weighted Round Robin | Proportional to weights | Heterogeneous server capacity |
| IP Hash | Same client to same server | Sticky sessions needed |
| Least Response Time | Based on health checks | Optimizing latency |
Monitoring Metrics
| Metric | Target | Action Threshold |
|---|---|---|
| CPU Utilization | 50-70% average | >80% sustained |
| Memory Usage | 60-80% | >90% |
| Request Latency p95 | <200ms | >500ms |
| Request Latency p99 | <500ms | >1000ms |
| Error Rate | <0.1% | >1% |
| Cache Hit Rate | >80% | <70% |
| Database Connection Pool | 60-80% used | >90% used |
| Queue Depth | <100 jobs | >1000 jobs |
Database Sharding Approaches
| Approach | Key Distribution | Query Complexity | Rebalancing |
|---|---|---|---|
| Hash-Based | Hash function | Simple single-shard | Difficult |
| Range-Based | Key ranges | Simple single-shard | Moderate |
| Geographic | Location | Simple regional | Moderate |
| Directory-Based | Lookup table | Complex | Easy |
| Composite | Multiple keys | Complex | Difficult |
Consistency Models
| Model | Guarantee | Performance | Use Case |
|---|---|---|---|
| Strong | All reads see latest write | Lower throughput | Financial transactions |
| Eventual | Reads converge to latest | High throughput | Social media feeds |
| Causal | Causally related operations ordered | Medium throughput | Chat applications |
| Read-Your-Writes | Client sees own writes | Medium throughput | User profiles |
| Monotonic Reads | No going backwards | High throughput | Analytics dashboards |