A distributed systems consistency model where data replicas converge to the same state after a period of time, allowing temporary inconsistencies to enable high availability and partition tolerance.
Overview
Eventual consistency represents a consistency model in distributed systems where updates to data do not immediately propagate to all replicas, but all replicas converge to the same value after a finite period without new updates. This model trades immediate consistency for improved availability, lower latency, and partition tolerance.
The concept emerged from research on distributed databases and replicated systems in the 1990s, gaining prominence with the CAP theorem formalized by Eric Brewer. The CAP theorem states that distributed systems can guarantee at most two of three properties: Consistency, Availability, and Partition tolerance. Eventual consistency chooses availability and partition tolerance over immediate consistency.
In eventually consistent systems, reads may return stale data, and different replicas may temporarily hold different values. However, the system guarantees that if no new updates occur, all replicas will eventually converge to the same state. This convergence happens through background synchronization, conflict resolution mechanisms, and replication protocols.
# Conceptual example: distributed cache scenario
class DistributedCache
def write(key, value)
# Write to local replica immediately
local_replica.set(key, value)
# Asynchronously replicate to other nodes
async_replicate_to_peers(key, value)
# Returns immediately without waiting for replication
true
end
def read(key)
# May return stale data if replication hasn't completed
local_replica.get(key)
end
end
Systems like Amazon DynamoDB, Apache Cassandra, and Riak use eventual consistency to achieve high availability and fault tolerance. Web applications commonly encounter eventual consistency in caching layers, content delivery networks, DNS propagation, and distributed session stores. Understanding eventual consistency becomes critical when designing systems that must remain available during network partitions or node failures.
Key Principles
The BASE properties characterize eventually consistent systems: Basically Available, Soft state, and Eventually consistent. These properties contrast with ACID properties that guarantee strong consistency in traditional databases.
Basically Available means the system appears operational most of the time, even during partial failures. The system responds to requests despite node failures or network issues, though responses may contain stale or incomplete data. Availability takes precedence over consistency guarantees.
Soft state indicates that system state may change over time without new input due to eventual consistency propagation. Data values stored in one replica may differ from values in other replicas temporarily. The system state remains soft until all replicas converge.
Eventually consistent guarantees that all replicas converge to the same value after a period without new updates. The convergence time depends on network latency, replication protocols, and system load. No upper bound typically exists for convergence time in practical systems.
Conflict detection and resolution form the foundation of eventual consistency. When concurrent updates occur on different replicas, conflicts arise that the system must resolve. Resolution strategies include last-write-wins, version vectors, conflict-free replicated data types, and application-level merge functions.
# Version vector for conflict detection
class VersionVector
def initialize
@versions = Hash.new(0)
end
def increment(node_id)
@versions[node_id] += 1
end
def compare(other)
# Returns :before, :after, :concurrent, or :equal
less_than = false
greater_than = false
all_nodes = (@versions.keys + other.versions.keys).uniq
all_nodes.each do |node|
mine = @versions[node]
theirs = other.versions[node]
less_than = true if mine < theirs
greater_than = true if mine > theirs
end
return :concurrent if less_than && greater_than
return :before if less_than
return :after if greater_than
:equal
end
end
Anti-entropy mechanisms ensure replicas eventually converge. Read repair updates stale replicas when reads detect inconsistencies. Hinted handoff stores updates for temporarily unavailable nodes. Merkle trees efficiently detect differences between replicas. Active anti-entropy runs background processes that compare and synchronize replicas.
Quorum-based replication balances consistency and availability. A quorum requires that read and write operations contact a minimum number of replicas. For N replicas with write quorum W and read quorum R, setting W + R > N guarantees that reads see at least one up-to-date value. Setting W = 1 and R = N favors write performance, while W = N and R = 1 favors read performance.
# Quorum-based write with async replication
class QuorumStore
def initialize(replicas:, write_quorum:, read_quorum:)
@replicas = replicas
@write_quorum = write_quorum
@read_quorum = read_quorum
end
def write(key, value)
results = []
# Write to all replicas asynchronously
threads = @replicas.map do |replica|
Thread.new { replica.write(key, value) }
end
# Wait for write quorum
threads.first(@write_quorum).each do |thread|
results << thread.value
end
# Allow remaining writes to complete in background
results.count(&:success?) >= @write_quorum
end
end
Causal consistency, a stronger form of eventual consistency, preserves cause-and-effect relationships. Operations that causally depend on each other appear in the same order at all replicas. Causal consistency prevents anomalies where effects appear before their causes while maintaining eventual consistency properties.
Design Considerations
Choosing eventual consistency involves understanding application requirements and acceptable trade-offs. Applications tolerating temporary inconsistency while requiring high availability benefit most from eventual consistency. Applications requiring immediate consistency, such as financial transactions or inventory management, typically require stronger consistency guarantees.
The nature of conflicts determines the feasibility of eventual consistency. Applications with commutative operations or naturally convergent updates handle eventual consistency well. Shopping cart additions, social media likes, and analytics counters resolve conflicts easily. Applications where operation order matters significantly or conflicts require human intervention face challenges with eventual consistency.
User experience requirements influence consistency model selection. Users may notice and become frustrated by inconsistencies in critical workflows. A user posting a comment that immediately disappears creates confusion. A user updating profile information that reverts to old values damages trust. Designing user interfaces that communicate eventual consistency reduces confusion.
# UI feedback pattern for eventual consistency
class CommentService
def create_comment(post_id, content)
comment = Comment.new(post_id: post_id, content: content)
# Optimistically add to local cache
cache.add(comment)
# Return immediately with pending status
comment.status = :pending
# Asynchronously persist and replicate
async_persist(comment) do |result|
if result.success?
comment.status = :published
broadcast_update(comment)
else
comment.status = :failed
cache.remove(comment)
end
end
comment
end
end
Read-after-write consistency determines whether users see their own writes immediately. Session stickiness routes user requests to the same replica, ensuring users see their own writes. Version tracking passes version information with requests, allowing reads to wait for specific versions. Client-side caching stores recent writes locally for immediate reads.
Conflict resolution strategies vary in complexity and correctness guarantees. Last-write-wins uses timestamps to determine winning updates, but requires synchronized clocks and may lose concurrent updates. Version vectors track causality accurately but add storage overhead. Application-specific merge functions preserve all updates but require complex business logic. Conflict-free replicated data types mathematically guarantee convergence but limit available operations.
Monitoring and operational complexity increase with eventual consistency. Applications must track replication lag, detect split-brain scenarios, and alert on convergence delays. Debugging becomes harder when different replicas show different states. Testing must verify behavior under various consistency scenarios and network partition conditions.
Business requirements determine acceptable staleness. Analytics dashboards tolerate minutes or hours of staleness. User profiles may tolerate seconds of staleness. Inventory counts require near-immediate consistency. Defining staleness SLAs guides architectural decisions and helps set monitoring thresholds.
Implementation Approaches
Several architectural patterns implement eventual consistency with different trade-offs and complexity levels. The choice depends on scale requirements, consistency needs, and operational capabilities.
Event Sourcing with Message Queues captures state changes as ordered events rather than storing current state. Events propagate through message queues to update replicas. This approach provides natural audit logs, enables temporal queries, and simplifies debugging. Event replay rebuilds state from event history. Event sourcing works well for domain-driven design and CQRS patterns.
// Event sourcing flow
1. Command updates primary aggregate
2. Aggregate emits domain event
3. Event persisted to event store
4. Event published to message queue
5. Subscribers update their views
6. Views eventually consistent with events
Active-Active Replication allows writes to any replica with conflict resolution handling concurrent updates. This pattern maximizes write availability and reduces write latency by allowing local writes. Conflict detection uses version vectors or vector clocks. Conflict resolution applies merge functions, last-write-wins, or custom business logic. Active-active replication suits globally distributed systems but increases operational complexity.
Primary-Backup Replication with Async Updates designates one primary replica handling writes and multiple backup replicas receiving async updates. Reads can occur on any replica, accepting potential staleness. Failover promotes a backup to primary during primary failure. This pattern simplifies conflict resolution but creates a single point of failure for writes and limits write scalability.
Eventual Consistency through Caching maintains a strongly consistent source of truth with eventually consistent caches. Applications write to the database and asynchronously invalidate or update caches. Cache expiration policies bound staleness. This pattern provides familiar architecture and limits eventual consistency to specific layers.
// Cache-aside pattern with async invalidation
1. Write to database (strongly consistent)
2. Return success to client
3. Async job invalidates cache entries
4. Subsequent reads miss cache
5. Cache repopulated from database
Gossip Protocols propagate updates through peer-to-peer communication where nodes randomly exchange state with other nodes. Updates spread exponentially through the cluster. Gossip protocols provide high availability and fault tolerance without central coordination. They suit membership management, configuration distribution, and metrics aggregation but may have slower convergence than directed replication.
Operational Transform enables collaborative editing by transforming concurrent operations to maintain consistency. Operations include insertion, deletion, and updates at specific positions. The transformation function ensures operations remain commutative regardless of application order. Operational transform powers real-time collaborative editors but requires complex transformation logic.
Conflict-Free Replicated Data Types (CRDTs) define data structures that mathematically guarantee convergence without coordination. CRDTs include counters, registers, sets, and maps with specific merge semantics. State-based CRDTs merge entire states, while operation-based CRDTs merge operations. CRDTs eliminate conflict resolution complexity but restrict available operations and may consume more memory.
Ruby Implementation
Ruby provides several libraries and patterns for implementing eventual consistency in applications. The Ruby ecosystem includes message queue clients, event sourcing frameworks, and distributed data structure libraries.
Sidekiq for Async Replication implements eventual consistency through background job processing. Sidekiq enqueues replication jobs that execute asynchronously, allowing primary operations to complete quickly while updates propagate in the background.
class UserUpdateReplicator
include Sidekiq::Worker
sidekiq_options retry: 5
def perform(user_id, changes)
user = User.find(user_id)
# Replicate to external services
replicate_to_search_index(user)
replicate_to_cache(user)
replicate_to_analytics(user)
rescue ServiceUnavailable => e
# Retry will handle temporary failures
raise e
end
end
class User < ApplicationRecord
after_commit :replicate_changes
def replicate_changes
UserUpdateReplicator.perform_async(id, changes)
end
end
Redis for Distributed State provides data structures supporting eventual consistency patterns. Redis Cluster distributes data across nodes with async replication. Redis pub/sub broadcasts updates to subscribers. Redis Streams implements event logs for event sourcing.
require 'redis'
class EventStore
def initialize
@redis = Redis.new
end
def append_event(stream, event)
@redis.xadd(
stream,
{ type: event.type, data: event.data.to_json },
id: '*'
)
end
def read_events(stream, start_id = '0')
@redis.xread(stream, start_id, block: 0)
end
def subscribe_to_events(stream, &block)
last_id = '0'
loop do
events = @redis.xread(stream, last_id, block: 1000)
events.each do |event|
last_id = event['id']
yield event
end
end
end
end
ActiveRecord with Multiple Databases implements primary-replica patterns. Rails 6+ supports multiple database connections with automatic role switching. Applications write to primary and read from replicas, accepting replication lag.
class ApplicationRecord < ActiveRecord::Base
connects_to database: { writing: :primary, reading: :replica }
end
class User < ApplicationRecord
# Writes go to primary
def update_profile(attributes)
update(attributes)
end
# Reads can use replica
def self.search(query)
connected_to(role: :reading) do
where('name LIKE ?', "%#{query}%")
end
end
# Ensure read-after-write consistency when needed
def fresh_reload
connected_to(role: :writing) do
reload
end
end
end
RabbitMQ or Kafka Integration implements event-driven eventual consistency. Applications publish domain events that consumers process to update their views. The ruby-kafka and bunny gems provide Ruby clients.
require 'bunny'
class EventPublisher
def initialize
@connection = Bunny.new
@connection.start
@channel = @connection.create_channel
@exchange = @channel.topic('domain.events')
end
def publish(event_type, payload)
@exchange.publish(
payload.to_json,
routing_key: event_type,
persistent: true,
timestamp: Time.now.to_i
)
end
end
class EventSubscriber
def initialize(queue_name, routing_keys)
@connection = Bunny.new
@connection.start
@channel = @connection.create_channel
@queue = @channel.queue(queue_name, durable: true)
routing_keys.each do |key|
@queue.bind('domain.events', routing_key: key)
end
end
def subscribe(&handler)
@queue.subscribe(block: true, manual_ack: true) do |delivery_info, properties, body|
event = JSON.parse(body)
handler.call(event)
@channel.ack(delivery_info.delivery_tag)
rescue => e
@channel.nack(delivery_info.delivery_tag, false, true)
end
end
end
Custom CRDT Implementations define convergent data structures. The meangirls gem provides CRDT implementations, or applications implement custom CRDTs for specific needs.
# G-Counter: Grow-only counter CRDT
class GCounter
def initialize(node_id)
@node_id = node_id
@counts = Hash.new(0)
end
def increment(amount = 1)
@counts[@node_id] += amount
end
def value
@counts.values.sum
end
def merge(other)
other.counts.each do |node_id, count|
@counts[node_id] = [@counts[node_id], count].max
end
end
protected
attr_reader :counts
end
# PN-Counter: Increment/decrement counter CRDT
class PNCounter
def initialize(node_id)
@positive = GCounter.new(node_id)
@negative = GCounter.new(node_id)
end
def increment(amount = 1)
@positive.increment(amount)
end
def decrement(amount = 1)
@negative.increment(amount)
end
def value
@positive.value - @negative.value
end
def merge(other)
@positive.merge(other.positive)
@negative.merge(other.negative)
end
end
Rails ActionCable for Real-Time Sync broadcasts updates to connected clients, implementing eventual consistency at the client level. Clients receive update notifications and refresh their local state.
class CacheInvalidationChannel < ApplicationCable::Channel
def subscribed
stream_from "cache_invalidation:#{current_user.id}"
end
end
class User < ApplicationRecord
after_commit :broadcast_changes
def broadcast_changes
ActionCable.server.broadcast(
"cache_invalidation:#{id}",
{
type: 'user_updated',
user_id: id,
version: updated_at.to_i
}
)
end
end
Practical Examples
Real-world scenarios demonstrate eventual consistency patterns and their implementation challenges.
Social Media Like Counter tolerates temporary inconsistencies because exact like counts matter less than showing approximate popularity. Users expect to see likes appear quickly without waiting for global consensus.
class LikeCounter
def initialize(post_id)
@post_id = post_id
@redis = Redis.new
end
def increment(user_id)
# Add to local counter immediately
@redis.hincrby("post:#{@post_id}:likes", "count", 1)
@redis.sadd("post:#{@post_id}:liked_by", user_id)
# Async job persists to database
PersistLikeJob.perform_async(@post_id, user_id)
end
def count
# Read from fast cache
@redis.hget("post:#{@post_id}:likes", "count").to_i
end
def rebuild_from_database
# Periodic reconciliation job
actual_count = Like.where(post_id: @post_id).count
@redis.hset("post:#{@post_id}:likes", "count", actual_count)
end
end
class PersistLikeJob
include Sidekiq::Worker
sidekiq_options retry: 3
def perform(post_id, user_id)
Like.find_or_create_by!(post_id: post_id, user_id: user_id)
rescue ActiveRecord::RecordNotUnique
# Duplicate like, ignore
end
end
Distributed Shopping Cart uses last-write-wins with session stickiness for simplicity. Users experience consistent cart state during their session while allowing concurrent updates from different sessions to merge eventually.
class ShoppingCart
def initialize(user_id)
@user_id = user_id
@redis = Redis.new
@key = "cart:#{user_id}"
end
def add_item(product_id, quantity)
timestamp = Time.now.to_f
item_data = {
product_id: product_id,
quantity: quantity,
timestamp: timestamp
}
# Store with timestamp for LWW conflict resolution
@redis.hset(@key, product_id, item_data.to_json)
# Replicate to persistent storage
CartReplicationJob.perform_async(@user_id, product_id, quantity, timestamp)
end
def merge_from_replica(replica_data)
replica_data.each do |product_id, data|
current = @redis.hget(@key, product_id)
if current.nil?
@redis.hset(@key, product_id, data)
else
current_data = JSON.parse(current)
replica_data = JSON.parse(data)
# Last-write-wins based on timestamp
if replica_data['timestamp'] > current_data['timestamp']
@redis.hset(@key, product_id, data)
end
end
end
end
def items
@redis.hgetall(@key).map do |product_id, data|
JSON.parse(data)
end
end
end
Multi-Region User Profile Service replicates profiles across geographic regions. Each region serves reads locally while propagating writes globally. Users see their own writes immediately through session affinity but other users may see stale profiles briefly.
class UserProfileService
def initialize(region:)
@region = region
@local_db = Database.connect(region: region)
@replication_queue = ReplicationQueue.new
end
def update_profile(user_id, attributes)
version = generate_version
# Write to local region immediately
profile = @local_db.update(
user_id,
attributes.merge(version: version, updated_at: Time.now)
)
# Enqueue replication to other regions
OTHER_REGIONS.each do |target_region|
@replication_queue.enqueue(
target_region: target_region,
user_id: user_id,
attributes: attributes,
version: version
)
end
profile
end
def get_profile(user_id, consistency: :eventual)
if consistency == :strong
# Read from all regions and return latest version
profiles = fetch_from_all_regions(user_id)
profiles.max_by { |p| p.version }
else
# Read from local region only
@local_db.find(user_id)
end
end
def apply_replication(user_id, attributes, version)
current = @local_db.find(user_id)
# Apply only if version is newer
if current.nil? || version > current.version
@local_db.update(user_id, attributes.merge(version: version))
end
end
end
Inventory Management with Reserved Quantities shows eventual consistency limits. Overselling occurs when multiple replicas process concurrent purchases before replication completes. Solutions include reserved inventory, compensation transactions, or requiring strong consistency for inventory operations.
class InventoryManager
def initialize
@redis = Redis.new
@db = Database.connect
end
def reserve_inventory(product_id, quantity)
# Use Lua script for atomic local reservation
script = <<~LUA
local available = redis.call('GET', KEYS[1]) or 0
if tonumber(available) >= tonumber(ARGV[1]) then
redis.call('DECRBY', KEYS[1], ARGV[1])
return 1
else
return 0
end
LUA
success = @redis.eval(
script,
keys: ["inventory:#{product_id}"],
argv: [quantity]
)
if success == 1
# Background job handles persistent storage and cross-region sync
PersistReservationJob.perform_async(product_id, quantity)
true
else
false
end
end
def reconcile_inventory(product_id)
# Periodic job resolves inconsistencies
cached = @redis.get("inventory:#{product_id}").to_i
actual = @db.get_inventory(product_id)
if cached != actual
# Database is source of truth
@redis.set("inventory:#{product_id}", actual)
# Alert if significant discrepancy
if (cached - actual).abs > THRESHOLD
alert_inventory_mismatch(product_id, cached, actual)
end
end
end
end
Common Pitfalls
Eventual consistency introduces subtle issues that developers encounter frequently. Understanding these pitfalls prevents bugs and improves system reliability.
Reading Your Own Writes Inconsistency occurs when users cannot see updates they just made. This happens when read requests route to replicas that have not yet received the update. Users perceive the system as broken when their changes disappear.
# Problematic: User may not see their update
def update_and_show
user.update(name: 'New Name')
redirect_to user_path(user) # May read from stale replica
end
# Solution: Force read from primary after write
def update_and_show
user.update(name: 'New Name')
# Read from primary to see own write
User.connected_to(role: :writing) do
@user = User.find(user.id)
end
render :show
end
# Alternative: Use version tracking
def update_and_show
result = user.update(name: 'New Name')
session[:expected_version] = user.reload.version
redirect_to user_path(user)
end
def show
@user = User.find(params[:id])
if session[:expected_version] && @user.version < session[:expected_version]
# Retry from primary if replica is behind
User.connected_to(role: :writing) do
@user = User.find(params[:id])
end
end
end
Lost Updates from Concurrent Modifications happen when two replicas accept concurrent writes that conflict. Without proper conflict detection, one update may silently disappear. Last-write-wins loses data from earlier writes.
# Vulnerable to lost updates
class Counter
def increment
current = read_value
new_value = current + 1
write_value(new_value) # Concurrent increments may be lost
end
end
# Solution: Use version vectors or compare-and-swap
class SafeCounter
def increment
loop do
current_value = read_value
current_version = read_version
new_value = current_value + 1
success = compare_and_swap(
expected_version: current_version,
new_value: new_value
)
break if success
# Retry on conflict
end
end
end
# Better: Use CRDT
class CRDTCounter
def increment
# Always converges without coordination
@g_counter.increment
replicate_async(@g_counter.state)
end
end
Violated Business Invariants occur when eventual consistency allows states that violate business rules. Account balances going negative, double-booking resources, or overselling inventory happen when replicas process operations independently.
# Dangerous: May violate non-negative balance invariant
class Account
def withdraw(amount)
balance = read_balance
if balance >= amount
write_balance(balance - amount) # Race condition with concurrent withdrawals
return true
end
false
end
end
# Solution: Require strong consistency for invariants
class Account
def withdraw(amount)
ActiveRecord::Base.transaction do
account = Account.lock.find(id) # Lock for strong consistency
if account.balance >= amount
account.balance -= amount
account.save!
true
else
false
end
end
end
end
Unbounded Replication Lag causes indefinite inconsistency when replicas fall too far behind. Slow replicas, network issues, or system failures create growing backlogs. Operations depend on timely convergence, but lag bounds are hard to guarantee.
# Monitor and alert on replication lag
class ReplicationMonitor
def check_lag
replicas.each do |replica|
lag = primary.current_version - replica.current_version
if lag > MAX_ACCEPTABLE_LAG
alert_replication_lag(replica, lag)
end
if lag > CRITICAL_LAG
# Remove replica from read pool
remove_from_load_balancer(replica)
end
end
end
def measure_lag
primary_timestamp = @primary_db.query("SELECT NOW()")
replica_timestamp = @replica_db.query("SELECT NOW()")
# Also check replication position
primary_position = @primary_db.query("SELECT pg_current_wal_lsn()")
replica_position = @replica_db.query("SELECT pg_last_wal_receive_lsn()")
{
time_lag: primary_timestamp - replica_timestamp,
byte_lag: primary_position - replica_position
}
end
end
Causal Anomalies appear when effects precede their causes. A comment reply appears before the original comment, or a delete operation appears before the creation. Users find these scenarios confusing and potentially damaging.
# Problematic: Reply may appear before original post
class Comment
def create_reply(content)
reply = Reply.create(comment_id: id, content: content)
# Both async replications may arrive out of order
replicate_async(self)
replicate_async(reply)
end
end
# Solution: Include causal dependencies
class Comment
def create_reply(content)
reply = Reply.create(
comment_id: id,
content: content,
depends_on_version: self.version # Track dependency
)
replicate_with_dependency(reply, depends_on: self)
end
end
def replicate_with_dependency(record, depends_on:)
ReplicationJob.perform_async(
record_type: record.class.name,
record_id: record.id,
dependency_type: depends_on.class.name,
dependency_id: depends_on.id,
dependency_version: depends_on.version
)
end
class ReplicationJob
def perform(record_type:, record_id:, dependency_type:, dependency_id:, dependency_version:)
# Wait until dependency is replicated
wait_for_version(dependency_type, dependency_id, dependency_version)
# Then apply this replication
apply_replication(record_type, record_id)
end
end
Failure to Handle Conflicts occurs when applications assume updates never conflict. Concurrent edits to the same resource create conflicts that applications must detect and resolve. Ignoring conflicts leads to data loss or corruption.
# Missing conflict handling
class DocumentEditor
def save_document(content)
Document.update(id, content: content) # Silently overwrites concurrent edits
end
end
# Solution: Detect and resolve conflicts
class DocumentEditor
def save_document(content, base_version)
current = Document.find(id)
if current.version == base_version
# No conflict, save normally
current.update(content: content, version: base_version + 1)
else
# Conflict detected
conflict = {
local_changes: content,
remote_changes: current.content,
base_version: base_version,
current_version: current.version
}
raise ConflictError, conflict
end
end
def merge_conflict(local_content, remote_content)
# Three-way merge or manual resolution
merged = three_way_merge(
base: load_version(base_version),
local: local_content,
remote: remote_content
)
save_document(merged, current_version)
end
end
Testing Only Happy Paths misses eventual consistency edge cases. Tests that assume immediate consistency pass but production systems experience races, conflicts, and ordering issues. Comprehensive testing requires simulating replication delays, network partitions, and concurrent operations.
# Inadequate test: Assumes immediate consistency
def test_user_update
user.update(name: 'Alice')
assert_equal 'Alice', User.find(user.id).name # May fail if reading from replica
end
# Better: Test eventual consistency explicitly
def test_eventual_consistency
user.update(name: 'Alice')
# Test reads from replica eventually consistent
assert_eventually(timeout: 5.seconds) do
User.connected_to(role: :reading) do
User.find(user.id).name == 'Alice'
end
end
end
def test_concurrent_updates
# Simulate concurrent updates from different replicas
threads = 10.times.map do |i|
Thread.new { user.increment_counter }
end
threads.each(&:join)
# Test conflict resolution converges correctly
assert_eventually do
replicas.map { |r| r.get_counter(user.id) }.uniq.size == 1
end
end
Reference
Consistency Models Comparison
| Model | Guarantees | Read Latency | Write Latency | Availability | Use Cases |
|---|---|---|---|---|---|
| Strong | All reads see latest write | High | High | Lower | Financial transactions, inventory |
| Eventual | Reads eventually consistent | Low | Low | Higher | Social media, analytics, caching |
| Causal | Preserves cause-effect order | Medium | Medium | High | Collaborative editing, messaging |
| Read-Your-Writes | Users see own updates | Medium | Low | High | User profiles, settings |
| Monotonic Reads | Reads never go backward | Low | Low | High | Session data, user feeds |
Conflict Resolution Strategies
| Strategy | How It Works | Data Loss Risk | Complexity | Best For |
|---|---|---|---|---|
| Last-Write-Wins | Timestamp determines winner | High | Low | Non-critical updates |
| Version Vectors | Track causality per node | Low | Medium | General distributed systems |
| CRDTs | Mathematically convergent | None | High | Counters, sets, collaborative editing |
| Application Merge | Custom business logic | Low | High | Complex domain objects |
| Manual Resolution | User resolves conflicts | None | Very High | Critical business data |
CAP Theorem Trade-offs
| System Type | Consistency | Availability | Partition Tolerance | Examples |
|---|---|---|---|---|
| CA | Strong | High | None | Single-node databases, RDBMS |
| CP | Strong | Lower | Yes | MongoDB, HBase, Redis Cluster |
| AP | Eventual | High | Yes | Cassandra, DynamoDB, Riak |
Replication Patterns
| Pattern | Write Target | Read Source | Conflict Handling | Failover |
|---|---|---|---|---|
| Primary-Backup Async | Primary only | Any replica | None (primary decides) | Promote backup |
| Primary-Backup Sync | Primary only | Any replica | None (primary decides) | Promote backup |
| Active-Active | Any replica | Any replica | Conflict resolution required | Already distributed |
| Quorum-Based | W replicas | R replicas | Depends on W+R>N | Automatic |
Ruby Gems for Eventual Consistency
| Gem | Purpose | Features | Use Case |
|---|---|---|---|
| sidekiq | Background jobs | Async processing, retries, scheduling | Async replication, eventual updates |
| redis | In-memory store | Fast reads/writes, pub/sub, streams | Caching, event streaming, counters |
| bunny | RabbitMQ client | Message queuing, routing, delivery guarantees | Event-driven architecture |
| ruby-kafka | Kafka client | Event streaming, partitioning, replay | Event sourcing, log aggregation |
| eventide | Event sourcing | Event store, projections, handlers | CQRS, event-driven systems |
| dynamo-autoscale | DynamoDB | Eventually consistent NoSQL | Highly available key-value store |
Anti-Entropy Mechanisms
| Mechanism | How It Works | Overhead | Convergence Speed | Best For |
|---|---|---|---|---|
| Read Repair | Fixes staleness during reads | Low | Slow | Read-heavy workloads |
| Hinted Handoff | Stores updates for offline nodes | Medium | Fast | Temporary failures |
| Merkle Trees | Compares replica state efficiently | Medium | Medium | Full synchronization |
| Active Anti-Entropy | Background sync processes | High | Fast | Critical consistency |
| Gossip Protocol | Peer-to-peer state exchange | Medium | Medium | Membership, metadata |
Monitoring Metrics
| Metric | What It Measures | Alert Threshold | Impact |
|---|---|---|---|
| Replication Lag | Time or position behind primary | Greater than 5 seconds | Stale reads |
| Conflict Rate | Conflicts per second | Sudden increases | Data loss risk |
| Convergence Time | Time to reach consistency | Greater than 1 minute | User confusion |
| Failed Replications | Replication errors | More than 1% | Data divergence |
| Split Brain Detection | Independent primaries | Any occurrence | Severe inconsistency |
Implementation Checklist
| Phase | Task | Consideration |
|---|---|---|
| Design | Define consistency requirements | What staleness is acceptable |
| Design | Choose conflict resolution strategy | How to handle concurrent updates |
| Design | Plan failure scenarios | What happens during partitions |
| Implementation | Add version tracking | Enable conflict detection |
| Implementation | Implement async replication | Queue, retry logic, monitoring |
| Implementation | Build conflict resolution | Merge functions or CRDTs |
| Testing | Test replication lag scenarios | Verify behavior with delays |
| Testing | Test concurrent update conflicts | Multiple replicas updating simultaneously |
| Testing | Test network partition handling | System behavior when split |
| Operations | Monitor replication lag | Alert on excessive lag |
| Operations | Track conflict rates | Detect resolution issues |
| Operations | Plan failover procedures | Handle primary failures |