CrackedRuby CrackedRuby

A distributed systems consistency model where data replicas converge to the same state after a period of time, allowing temporary inconsistencies to enable high availability and partition tolerance.

Overview

Eventual consistency represents a consistency model in distributed systems where updates to data do not immediately propagate to all replicas, but all replicas converge to the same value after a finite period without new updates. This model trades immediate consistency for improved availability, lower latency, and partition tolerance.

The concept emerged from research on distributed databases and replicated systems in the 1990s, gaining prominence with the CAP theorem formalized by Eric Brewer. The CAP theorem states that distributed systems can guarantee at most two of three properties: Consistency, Availability, and Partition tolerance. Eventual consistency chooses availability and partition tolerance over immediate consistency.

In eventually consistent systems, reads may return stale data, and different replicas may temporarily hold different values. However, the system guarantees that if no new updates occur, all replicas will eventually converge to the same state. This convergence happens through background synchronization, conflict resolution mechanisms, and replication protocols.

# Conceptual example: distributed cache scenario
class DistributedCache
  def write(key, value)
    # Write to local replica immediately
    local_replica.set(key, value)
    
    # Asynchronously replicate to other nodes
    async_replicate_to_peers(key, value)
    
    # Returns immediately without waiting for replication
    true
  end
  
  def read(key)
    # May return stale data if replication hasn't completed
    local_replica.get(key)
  end
end

Systems like Amazon DynamoDB, Apache Cassandra, and Riak use eventual consistency to achieve high availability and fault tolerance. Web applications commonly encounter eventual consistency in caching layers, content delivery networks, DNS propagation, and distributed session stores. Understanding eventual consistency becomes critical when designing systems that must remain available during network partitions or node failures.

Key Principles

The BASE properties characterize eventually consistent systems: Basically Available, Soft state, and Eventually consistent. These properties contrast with ACID properties that guarantee strong consistency in traditional databases.

Basically Available means the system appears operational most of the time, even during partial failures. The system responds to requests despite node failures or network issues, though responses may contain stale or incomplete data. Availability takes precedence over consistency guarantees.

Soft state indicates that system state may change over time without new input due to eventual consistency propagation. Data values stored in one replica may differ from values in other replicas temporarily. The system state remains soft until all replicas converge.

Eventually consistent guarantees that all replicas converge to the same value after a period without new updates. The convergence time depends on network latency, replication protocols, and system load. No upper bound typically exists for convergence time in practical systems.

Conflict detection and resolution form the foundation of eventual consistency. When concurrent updates occur on different replicas, conflicts arise that the system must resolve. Resolution strategies include last-write-wins, version vectors, conflict-free replicated data types, and application-level merge functions.

# Version vector for conflict detection
class VersionVector
  def initialize
    @versions = Hash.new(0)
  end
  
  def increment(node_id)
    @versions[node_id] += 1
  end
  
  def compare(other)
    # Returns :before, :after, :concurrent, or :equal
    less_than = false
    greater_than = false
    
    all_nodes = (@versions.keys + other.versions.keys).uniq
    
    all_nodes.each do |node|
      mine = @versions[node]
      theirs = other.versions[node]
      
      less_than = true if mine < theirs
      greater_than = true if mine > theirs
    end
    
    return :concurrent if less_than && greater_than
    return :before if less_than
    return :after if greater_than
    :equal
  end
end

Anti-entropy mechanisms ensure replicas eventually converge. Read repair updates stale replicas when reads detect inconsistencies. Hinted handoff stores updates for temporarily unavailable nodes. Merkle trees efficiently detect differences between replicas. Active anti-entropy runs background processes that compare and synchronize replicas.

Quorum-based replication balances consistency and availability. A quorum requires that read and write operations contact a minimum number of replicas. For N replicas with write quorum W and read quorum R, setting W + R > N guarantees that reads see at least one up-to-date value. Setting W = 1 and R = N favors write performance, while W = N and R = 1 favors read performance.

# Quorum-based write with async replication
class QuorumStore
  def initialize(replicas:, write_quorum:, read_quorum:)
    @replicas = replicas
    @write_quorum = write_quorum
    @read_quorum = read_quorum
  end
  
  def write(key, value)
    results = []
    
    # Write to all replicas asynchronously
    threads = @replicas.map do |replica|
      Thread.new { replica.write(key, value) }
    end
    
    # Wait for write quorum
    threads.first(@write_quorum).each do |thread|
      results << thread.value
    end
    
    # Allow remaining writes to complete in background
    results.count(&:success?) >= @write_quorum
  end
end

Causal consistency, a stronger form of eventual consistency, preserves cause-and-effect relationships. Operations that causally depend on each other appear in the same order at all replicas. Causal consistency prevents anomalies where effects appear before their causes while maintaining eventual consistency properties.

Design Considerations

Choosing eventual consistency involves understanding application requirements and acceptable trade-offs. Applications tolerating temporary inconsistency while requiring high availability benefit most from eventual consistency. Applications requiring immediate consistency, such as financial transactions or inventory management, typically require stronger consistency guarantees.

The nature of conflicts determines the feasibility of eventual consistency. Applications with commutative operations or naturally convergent updates handle eventual consistency well. Shopping cart additions, social media likes, and analytics counters resolve conflicts easily. Applications where operation order matters significantly or conflicts require human intervention face challenges with eventual consistency.

User experience requirements influence consistency model selection. Users may notice and become frustrated by inconsistencies in critical workflows. A user posting a comment that immediately disappears creates confusion. A user updating profile information that reverts to old values damages trust. Designing user interfaces that communicate eventual consistency reduces confusion.

# UI feedback pattern for eventual consistency
class CommentService
  def create_comment(post_id, content)
    comment = Comment.new(post_id: post_id, content: content)
    
    # Optimistically add to local cache
    cache.add(comment)
    
    # Return immediately with pending status
    comment.status = :pending
    
    # Asynchronously persist and replicate
    async_persist(comment) do |result|
      if result.success?
        comment.status = :published
        broadcast_update(comment)
      else
        comment.status = :failed
        cache.remove(comment)
      end
    end
    
    comment
  end
end

Read-after-write consistency determines whether users see their own writes immediately. Session stickiness routes user requests to the same replica, ensuring users see their own writes. Version tracking passes version information with requests, allowing reads to wait for specific versions. Client-side caching stores recent writes locally for immediate reads.

Conflict resolution strategies vary in complexity and correctness guarantees. Last-write-wins uses timestamps to determine winning updates, but requires synchronized clocks and may lose concurrent updates. Version vectors track causality accurately but add storage overhead. Application-specific merge functions preserve all updates but require complex business logic. Conflict-free replicated data types mathematically guarantee convergence but limit available operations.

Monitoring and operational complexity increase with eventual consistency. Applications must track replication lag, detect split-brain scenarios, and alert on convergence delays. Debugging becomes harder when different replicas show different states. Testing must verify behavior under various consistency scenarios and network partition conditions.

Business requirements determine acceptable staleness. Analytics dashboards tolerate minutes or hours of staleness. User profiles may tolerate seconds of staleness. Inventory counts require near-immediate consistency. Defining staleness SLAs guides architectural decisions and helps set monitoring thresholds.

Implementation Approaches

Several architectural patterns implement eventual consistency with different trade-offs and complexity levels. The choice depends on scale requirements, consistency needs, and operational capabilities.

Event Sourcing with Message Queues captures state changes as ordered events rather than storing current state. Events propagate through message queues to update replicas. This approach provides natural audit logs, enables temporal queries, and simplifies debugging. Event replay rebuilds state from event history. Event sourcing works well for domain-driven design and CQRS patterns.

// Event sourcing flow
1. Command updates primary aggregate
2. Aggregate emits domain event
3. Event persisted to event store
4. Event published to message queue
5. Subscribers update their views
6. Views eventually consistent with events

Active-Active Replication allows writes to any replica with conflict resolution handling concurrent updates. This pattern maximizes write availability and reduces write latency by allowing local writes. Conflict detection uses version vectors or vector clocks. Conflict resolution applies merge functions, last-write-wins, or custom business logic. Active-active replication suits globally distributed systems but increases operational complexity.

Primary-Backup Replication with Async Updates designates one primary replica handling writes and multiple backup replicas receiving async updates. Reads can occur on any replica, accepting potential staleness. Failover promotes a backup to primary during primary failure. This pattern simplifies conflict resolution but creates a single point of failure for writes and limits write scalability.

Eventual Consistency through Caching maintains a strongly consistent source of truth with eventually consistent caches. Applications write to the database and asynchronously invalidate or update caches. Cache expiration policies bound staleness. This pattern provides familiar architecture and limits eventual consistency to specific layers.

// Cache-aside pattern with async invalidation
1. Write to database (strongly consistent)
2. Return success to client
3. Async job invalidates cache entries
4. Subsequent reads miss cache
5. Cache repopulated from database

Gossip Protocols propagate updates through peer-to-peer communication where nodes randomly exchange state with other nodes. Updates spread exponentially through the cluster. Gossip protocols provide high availability and fault tolerance without central coordination. They suit membership management, configuration distribution, and metrics aggregation but may have slower convergence than directed replication.

Operational Transform enables collaborative editing by transforming concurrent operations to maintain consistency. Operations include insertion, deletion, and updates at specific positions. The transformation function ensures operations remain commutative regardless of application order. Operational transform powers real-time collaborative editors but requires complex transformation logic.

Conflict-Free Replicated Data Types (CRDTs) define data structures that mathematically guarantee convergence without coordination. CRDTs include counters, registers, sets, and maps with specific merge semantics. State-based CRDTs merge entire states, while operation-based CRDTs merge operations. CRDTs eliminate conflict resolution complexity but restrict available operations and may consume more memory.

Ruby Implementation

Ruby provides several libraries and patterns for implementing eventual consistency in applications. The Ruby ecosystem includes message queue clients, event sourcing frameworks, and distributed data structure libraries.

Sidekiq for Async Replication implements eventual consistency through background job processing. Sidekiq enqueues replication jobs that execute asynchronously, allowing primary operations to complete quickly while updates propagate in the background.

class UserUpdateReplicator
  include Sidekiq::Worker
  sidekiq_options retry: 5
  
  def perform(user_id, changes)
    user = User.find(user_id)
    
    # Replicate to external services
    replicate_to_search_index(user)
    replicate_to_cache(user)
    replicate_to_analytics(user)
  rescue ServiceUnavailable => e
    # Retry will handle temporary failures
    raise e
  end
end

class User < ApplicationRecord
  after_commit :replicate_changes
  
  def replicate_changes
    UserUpdateReplicator.perform_async(id, changes)
  end
end

Redis for Distributed State provides data structures supporting eventual consistency patterns. Redis Cluster distributes data across nodes with async replication. Redis pub/sub broadcasts updates to subscribers. Redis Streams implements event logs for event sourcing.

require 'redis'

class EventStore
  def initialize
    @redis = Redis.new
  end
  
  def append_event(stream, event)
    @redis.xadd(
      stream,
      { type: event.type, data: event.data.to_json },
      id: '*'
    )
  end
  
  def read_events(stream, start_id = '0')
    @redis.xread(stream, start_id, block: 0)
  end
  
  def subscribe_to_events(stream, &block)
    last_id = '0'
    
    loop do
      events = @redis.xread(stream, last_id, block: 1000)
      
      events.each do |event|
        last_id = event['id']
        yield event
      end
    end
  end
end

ActiveRecord with Multiple Databases implements primary-replica patterns. Rails 6+ supports multiple database connections with automatic role switching. Applications write to primary and read from replicas, accepting replication lag.

class ApplicationRecord < ActiveRecord::Base
  connects_to database: { writing: :primary, reading: :replica }
end

class User < ApplicationRecord
  # Writes go to primary
  def update_profile(attributes)
    update(attributes)
  end
  
  # Reads can use replica
  def self.search(query)
    connected_to(role: :reading) do
      where('name LIKE ?', "%#{query}%")
    end
  end
  
  # Ensure read-after-write consistency when needed
  def fresh_reload
    connected_to(role: :writing) do
      reload
    end
  end
end

RabbitMQ or Kafka Integration implements event-driven eventual consistency. Applications publish domain events that consumers process to update their views. The ruby-kafka and bunny gems provide Ruby clients.

require 'bunny'

class EventPublisher
  def initialize
    @connection = Bunny.new
    @connection.start
    @channel = @connection.create_channel
    @exchange = @channel.topic('domain.events')
  end
  
  def publish(event_type, payload)
    @exchange.publish(
      payload.to_json,
      routing_key: event_type,
      persistent: true,
      timestamp: Time.now.to_i
    )
  end
end

class EventSubscriber
  def initialize(queue_name, routing_keys)
    @connection = Bunny.new
    @connection.start
    @channel = @connection.create_channel
    @queue = @channel.queue(queue_name, durable: true)
    
    routing_keys.each do |key|
      @queue.bind('domain.events', routing_key: key)
    end
  end
  
  def subscribe(&handler)
    @queue.subscribe(block: true, manual_ack: true) do |delivery_info, properties, body|
      event = JSON.parse(body)
      handler.call(event)
      @channel.ack(delivery_info.delivery_tag)
    rescue => e
      @channel.nack(delivery_info.delivery_tag, false, true)
    end
  end
end

Custom CRDT Implementations define convergent data structures. The meangirls gem provides CRDT implementations, or applications implement custom CRDTs for specific needs.

# G-Counter: Grow-only counter CRDT
class GCounter
  def initialize(node_id)
    @node_id = node_id
    @counts = Hash.new(0)
  end
  
  def increment(amount = 1)
    @counts[@node_id] += amount
  end
  
  def value
    @counts.values.sum
  end
  
  def merge(other)
    other.counts.each do |node_id, count|
      @counts[node_id] = [@counts[node_id], count].max
    end
  end
  
  protected
  
  attr_reader :counts
end

# PN-Counter: Increment/decrement counter CRDT
class PNCounter
  def initialize(node_id)
    @positive = GCounter.new(node_id)
    @negative = GCounter.new(node_id)
  end
  
  def increment(amount = 1)
    @positive.increment(amount)
  end
  
  def decrement(amount = 1)
    @negative.increment(amount)
  end
  
  def value
    @positive.value - @negative.value
  end
  
  def merge(other)
    @positive.merge(other.positive)
    @negative.merge(other.negative)
  end
end

Rails ActionCable for Real-Time Sync broadcasts updates to connected clients, implementing eventual consistency at the client level. Clients receive update notifications and refresh their local state.

class CacheInvalidationChannel < ApplicationCable::Channel
  def subscribed
    stream_from "cache_invalidation:#{current_user.id}"
  end
end

class User < ApplicationRecord
  after_commit :broadcast_changes
  
  def broadcast_changes
    ActionCable.server.broadcast(
      "cache_invalidation:#{id}",
      {
        type: 'user_updated',
        user_id: id,
        version: updated_at.to_i
      }
    )
  end
end

Practical Examples

Real-world scenarios demonstrate eventual consistency patterns and their implementation challenges.

Social Media Like Counter tolerates temporary inconsistencies because exact like counts matter less than showing approximate popularity. Users expect to see likes appear quickly without waiting for global consensus.

class LikeCounter
  def initialize(post_id)
    @post_id = post_id
    @redis = Redis.new
  end
  
  def increment(user_id)
    # Add to local counter immediately
    @redis.hincrby("post:#{@post_id}:likes", "count", 1)
    @redis.sadd("post:#{@post_id}:liked_by", user_id)
    
    # Async job persists to database
    PersistLikeJob.perform_async(@post_id, user_id)
  end
  
  def count
    # Read from fast cache
    @redis.hget("post:#{@post_id}:likes", "count").to_i
  end
  
  def rebuild_from_database
    # Periodic reconciliation job
    actual_count = Like.where(post_id: @post_id).count
    @redis.hset("post:#{@post_id}:likes", "count", actual_count)
  end
end

class PersistLikeJob
  include Sidekiq::Worker
  sidekiq_options retry: 3
  
  def perform(post_id, user_id)
    Like.find_or_create_by!(post_id: post_id, user_id: user_id)
  rescue ActiveRecord::RecordNotUnique
    # Duplicate like, ignore
  end
end

Distributed Shopping Cart uses last-write-wins with session stickiness for simplicity. Users experience consistent cart state during their session while allowing concurrent updates from different sessions to merge eventually.

class ShoppingCart
  def initialize(user_id)
    @user_id = user_id
    @redis = Redis.new
    @key = "cart:#{user_id}"
  end
  
  def add_item(product_id, quantity)
    timestamp = Time.now.to_f
    
    item_data = {
      product_id: product_id,
      quantity: quantity,
      timestamp: timestamp
    }
    
    # Store with timestamp for LWW conflict resolution
    @redis.hset(@key, product_id, item_data.to_json)
    
    # Replicate to persistent storage
    CartReplicationJob.perform_async(@user_id, product_id, quantity, timestamp)
  end
  
  def merge_from_replica(replica_data)
    replica_data.each do |product_id, data|
      current = @redis.hget(@key, product_id)
      
      if current.nil?
        @redis.hset(@key, product_id, data)
      else
        current_data = JSON.parse(current)
        replica_data = JSON.parse(data)
        
        # Last-write-wins based on timestamp
        if replica_data['timestamp'] > current_data['timestamp']
          @redis.hset(@key, product_id, data)
        end
      end
    end
  end
  
  def items
    @redis.hgetall(@key).map do |product_id, data|
      JSON.parse(data)
    end
  end
end

Multi-Region User Profile Service replicates profiles across geographic regions. Each region serves reads locally while propagating writes globally. Users see their own writes immediately through session affinity but other users may see stale profiles briefly.

class UserProfileService
  def initialize(region:)
    @region = region
    @local_db = Database.connect(region: region)
    @replication_queue = ReplicationQueue.new
  end
  
  def update_profile(user_id, attributes)
    version = generate_version
    
    # Write to local region immediately
    profile = @local_db.update(
      user_id,
      attributes.merge(version: version, updated_at: Time.now)
    )
    
    # Enqueue replication to other regions
    OTHER_REGIONS.each do |target_region|
      @replication_queue.enqueue(
        target_region: target_region,
        user_id: user_id,
        attributes: attributes,
        version: version
      )
    end
    
    profile
  end
  
  def get_profile(user_id, consistency: :eventual)
    if consistency == :strong
      # Read from all regions and return latest version
      profiles = fetch_from_all_regions(user_id)
      profiles.max_by { |p| p.version }
    else
      # Read from local region only
      @local_db.find(user_id)
    end
  end
  
  def apply_replication(user_id, attributes, version)
    current = @local_db.find(user_id)
    
    # Apply only if version is newer
    if current.nil? || version > current.version
      @local_db.update(user_id, attributes.merge(version: version))
    end
  end
end

Inventory Management with Reserved Quantities shows eventual consistency limits. Overselling occurs when multiple replicas process concurrent purchases before replication completes. Solutions include reserved inventory, compensation transactions, or requiring strong consistency for inventory operations.

class InventoryManager
  def initialize
    @redis = Redis.new
    @db = Database.connect
  end
  
  def reserve_inventory(product_id, quantity)
    # Use Lua script for atomic local reservation
    script = <<~LUA
      local available = redis.call('GET', KEYS[1]) or 0
      if tonumber(available) >= tonumber(ARGV[1]) then
        redis.call('DECRBY', KEYS[1], ARGV[1])
        return 1
      else
        return 0
      end
    LUA
    
    success = @redis.eval(
      script,
      keys: ["inventory:#{product_id}"],
      argv: [quantity]
    )
    
    if success == 1
      # Background job handles persistent storage and cross-region sync
      PersistReservationJob.perform_async(product_id, quantity)
      true
    else
      false
    end
  end
  
  def reconcile_inventory(product_id)
    # Periodic job resolves inconsistencies
    cached = @redis.get("inventory:#{product_id}").to_i
    actual = @db.get_inventory(product_id)
    
    if cached != actual
      # Database is source of truth
      @redis.set("inventory:#{product_id}", actual)
      
      # Alert if significant discrepancy
      if (cached - actual).abs > THRESHOLD
        alert_inventory_mismatch(product_id, cached, actual)
      end
    end
  end
end

Common Pitfalls

Eventual consistency introduces subtle issues that developers encounter frequently. Understanding these pitfalls prevents bugs and improves system reliability.

Reading Your Own Writes Inconsistency occurs when users cannot see updates they just made. This happens when read requests route to replicas that have not yet received the update. Users perceive the system as broken when their changes disappear.

# Problematic: User may not see their update
def update_and_show
  user.update(name: 'New Name')
  redirect_to user_path(user)  # May read from stale replica
end

# Solution: Force read from primary after write
def update_and_show
  user.update(name: 'New Name')
  
  # Read from primary to see own write
  User.connected_to(role: :writing) do
    @user = User.find(user.id)
  end
  
  render :show
end

# Alternative: Use version tracking
def update_and_show
  result = user.update(name: 'New Name')
  session[:expected_version] = user.reload.version
  redirect_to user_path(user)
end

def show
  @user = User.find(params[:id])
  
  if session[:expected_version] && @user.version < session[:expected_version]
    # Retry from primary if replica is behind
    User.connected_to(role: :writing) do
      @user = User.find(params[:id])
    end
  end
end

Lost Updates from Concurrent Modifications happen when two replicas accept concurrent writes that conflict. Without proper conflict detection, one update may silently disappear. Last-write-wins loses data from earlier writes.

# Vulnerable to lost updates
class Counter
  def increment
    current = read_value
    new_value = current + 1
    write_value(new_value)  # Concurrent increments may be lost
  end
end

# Solution: Use version vectors or compare-and-swap
class SafeCounter
  def increment
    loop do
      current_value = read_value
      current_version = read_version
      new_value = current_value + 1
      
      success = compare_and_swap(
        expected_version: current_version,
        new_value: new_value
      )
      
      break if success
      # Retry on conflict
    end
  end
end

# Better: Use CRDT
class CRDTCounter
  def increment
    # Always converges without coordination
    @g_counter.increment
    replicate_async(@g_counter.state)
  end
end

Violated Business Invariants occur when eventual consistency allows states that violate business rules. Account balances going negative, double-booking resources, or overselling inventory happen when replicas process operations independently.

# Dangerous: May violate non-negative balance invariant
class Account
  def withdraw(amount)
    balance = read_balance
    
    if balance >= amount
      write_balance(balance - amount)  # Race condition with concurrent withdrawals
      return true
    end
    
    false
  end
end

# Solution: Require strong consistency for invariants
class Account
  def withdraw(amount)
    ActiveRecord::Base.transaction do
      account = Account.lock.find(id)  # Lock for strong consistency
      
      if account.balance >= amount
        account.balance -= amount
        account.save!
        true
      else
        false
      end
    end
  end
end

Unbounded Replication Lag causes indefinite inconsistency when replicas fall too far behind. Slow replicas, network issues, or system failures create growing backlogs. Operations depend on timely convergence, but lag bounds are hard to guarantee.

# Monitor and alert on replication lag
class ReplicationMonitor
  def check_lag
    replicas.each do |replica|
      lag = primary.current_version - replica.current_version
      
      if lag > MAX_ACCEPTABLE_LAG
        alert_replication_lag(replica, lag)
      end
      
      if lag > CRITICAL_LAG
        # Remove replica from read pool
        remove_from_load_balancer(replica)
      end
    end
  end
  
  def measure_lag
    primary_timestamp = @primary_db.query("SELECT NOW()")
    replica_timestamp = @replica_db.query("SELECT NOW()")
    
    # Also check replication position
    primary_position = @primary_db.query("SELECT pg_current_wal_lsn()")
    replica_position = @replica_db.query("SELECT pg_last_wal_receive_lsn()")
    
    {
      time_lag: primary_timestamp - replica_timestamp,
      byte_lag: primary_position - replica_position
    }
  end
end

Causal Anomalies appear when effects precede their causes. A comment reply appears before the original comment, or a delete operation appears before the creation. Users find these scenarios confusing and potentially damaging.

# Problematic: Reply may appear before original post
class Comment
  def create_reply(content)
    reply = Reply.create(comment_id: id, content: content)
    
    # Both async replications may arrive out of order
    replicate_async(self)
    replicate_async(reply)
  end
end

# Solution: Include causal dependencies
class Comment
  def create_reply(content)
    reply = Reply.create(
      comment_id: id,
      content: content,
      depends_on_version: self.version  # Track dependency
    )
    
    replicate_with_dependency(reply, depends_on: self)
  end
end

def replicate_with_dependency(record, depends_on:)
  ReplicationJob.perform_async(
    record_type: record.class.name,
    record_id: record.id,
    dependency_type: depends_on.class.name,
    dependency_id: depends_on.id,
    dependency_version: depends_on.version
  )
end

class ReplicationJob
  def perform(record_type:, record_id:, dependency_type:, dependency_id:, dependency_version:)
    # Wait until dependency is replicated
    wait_for_version(dependency_type, dependency_id, dependency_version)
    
    # Then apply this replication
    apply_replication(record_type, record_id)
  end
end

Failure to Handle Conflicts occurs when applications assume updates never conflict. Concurrent edits to the same resource create conflicts that applications must detect and resolve. Ignoring conflicts leads to data loss or corruption.

# Missing conflict handling
class DocumentEditor
  def save_document(content)
    Document.update(id, content: content)  # Silently overwrites concurrent edits
  end
end

# Solution: Detect and resolve conflicts
class DocumentEditor
  def save_document(content, base_version)
    current = Document.find(id)
    
    if current.version == base_version
      # No conflict, save normally
      current.update(content: content, version: base_version + 1)
    else
      # Conflict detected
      conflict = {
        local_changes: content,
        remote_changes: current.content,
        base_version: base_version,
        current_version: current.version
      }
      
      raise ConflictError, conflict
    end
  end
  
  def merge_conflict(local_content, remote_content)
    # Three-way merge or manual resolution
    merged = three_way_merge(
      base: load_version(base_version),
      local: local_content,
      remote: remote_content
    )
    
    save_document(merged, current_version)
  end
end

Testing Only Happy Paths misses eventual consistency edge cases. Tests that assume immediate consistency pass but production systems experience races, conflicts, and ordering issues. Comprehensive testing requires simulating replication delays, network partitions, and concurrent operations.

# Inadequate test: Assumes immediate consistency
def test_user_update
  user.update(name: 'Alice')
  assert_equal 'Alice', User.find(user.id).name  # May fail if reading from replica
end

# Better: Test eventual consistency explicitly
def test_eventual_consistency
  user.update(name: 'Alice')
  
  # Test reads from replica eventually consistent
  assert_eventually(timeout: 5.seconds) do
    User.connected_to(role: :reading) do
      User.find(user.id).name == 'Alice'
    end
  end
end

def test_concurrent_updates
  # Simulate concurrent updates from different replicas
  threads = 10.times.map do |i|
    Thread.new { user.increment_counter }
  end
  
  threads.each(&:join)
  
  # Test conflict resolution converges correctly
  assert_eventually do
    replicas.map { |r| r.get_counter(user.id) }.uniq.size == 1
  end
end

Reference

Consistency Models Comparison

Model Guarantees Read Latency Write Latency Availability Use Cases
Strong All reads see latest write High High Lower Financial transactions, inventory
Eventual Reads eventually consistent Low Low Higher Social media, analytics, caching
Causal Preserves cause-effect order Medium Medium High Collaborative editing, messaging
Read-Your-Writes Users see own updates Medium Low High User profiles, settings
Monotonic Reads Reads never go backward Low Low High Session data, user feeds

Conflict Resolution Strategies

Strategy How It Works Data Loss Risk Complexity Best For
Last-Write-Wins Timestamp determines winner High Low Non-critical updates
Version Vectors Track causality per node Low Medium General distributed systems
CRDTs Mathematically convergent None High Counters, sets, collaborative editing
Application Merge Custom business logic Low High Complex domain objects
Manual Resolution User resolves conflicts None Very High Critical business data

CAP Theorem Trade-offs

System Type Consistency Availability Partition Tolerance Examples
CA Strong High None Single-node databases, RDBMS
CP Strong Lower Yes MongoDB, HBase, Redis Cluster
AP Eventual High Yes Cassandra, DynamoDB, Riak

Replication Patterns

Pattern Write Target Read Source Conflict Handling Failover
Primary-Backup Async Primary only Any replica None (primary decides) Promote backup
Primary-Backup Sync Primary only Any replica None (primary decides) Promote backup
Active-Active Any replica Any replica Conflict resolution required Already distributed
Quorum-Based W replicas R replicas Depends on W+R>N Automatic

Ruby Gems for Eventual Consistency

Gem Purpose Features Use Case
sidekiq Background jobs Async processing, retries, scheduling Async replication, eventual updates
redis In-memory store Fast reads/writes, pub/sub, streams Caching, event streaming, counters
bunny RabbitMQ client Message queuing, routing, delivery guarantees Event-driven architecture
ruby-kafka Kafka client Event streaming, partitioning, replay Event sourcing, log aggregation
eventide Event sourcing Event store, projections, handlers CQRS, event-driven systems
dynamo-autoscale DynamoDB Eventually consistent NoSQL Highly available key-value store

Anti-Entropy Mechanisms

Mechanism How It Works Overhead Convergence Speed Best For
Read Repair Fixes staleness during reads Low Slow Read-heavy workloads
Hinted Handoff Stores updates for offline nodes Medium Fast Temporary failures
Merkle Trees Compares replica state efficiently Medium Medium Full synchronization
Active Anti-Entropy Background sync processes High Fast Critical consistency
Gossip Protocol Peer-to-peer state exchange Medium Medium Membership, metadata

Monitoring Metrics

Metric What It Measures Alert Threshold Impact
Replication Lag Time or position behind primary Greater than 5 seconds Stale reads
Conflict Rate Conflicts per second Sudden increases Data loss risk
Convergence Time Time to reach consistency Greater than 1 minute User confusion
Failed Replications Replication errors More than 1% Data divergence
Split Brain Detection Independent primaries Any occurrence Severe inconsistency

Implementation Checklist

Phase Task Consideration
Design Define consistency requirements What staleness is acceptable
Design Choose conflict resolution strategy How to handle concurrent updates
Design Plan failure scenarios What happens during partitions
Implementation Add version tracking Enable conflict detection
Implementation Implement async replication Queue, retry logic, monitoring
Implementation Build conflict resolution Merge functions or CRDTs
Testing Test replication lag scenarios Verify behavior with delays
Testing Test concurrent update conflicts Multiple replicas updating simultaneously
Testing Test network partition handling System behavior when split
Operations Monitor replication lag Alert on excessive lag
Operations Track conflict rates Detect resolution issues
Operations Plan failover procedures Handle primary failures