CrackedRuby CrackedRuby

Overview

Replication strategies define how data copies are created, maintained, and synchronized across multiple nodes in distributed systems. These strategies determine the consistency guarantees, availability characteristics, and performance trade-offs of a system. The choice of replication strategy directly impacts application behavior, scalability limits, and operational complexity.

Database replication addresses several critical requirements: fault tolerance through redundancy, improved read performance through load distribution, geographic data locality, and disaster recovery capabilities. Each replication strategy represents a different position on the consistency-availability-partition tolerance spectrum defined by the CAP theorem.

The fundamental challenge in replication involves maintaining data consistency across nodes while managing network latency, potential failures, and concurrent updates. Different strategies make different trade-offs between these concerns. A master-slave setup prioritizes consistency by channeling all writes through a single node, while multi-master configurations prioritize availability by accepting writes at any node.

# Basic replication setup showing different strategies
class ReplicationManager
  attr_reader :strategy, :nodes

  def initialize(strategy:, nodes:)
    @strategy = strategy
    @nodes = nodes
    @primary = nodes.first
  end

  def write(key, value)
    case strategy
    when :master_slave
      @primary.write(key, value)
      replicate_async(key, value)
    when :multi_master
      current_node.write(key, value)
      broadcast_write(key, value)
    end
  end
end

Modern distributed systems often combine multiple replication strategies. A system might use synchronous replication within a data center for strong consistency while employing asynchronous replication across geographic regions for disaster recovery. Understanding the implications of each strategy enables informed architectural decisions.

Key Principles

Replication strategies operate on several fundamental principles that govern their behavior and characteristics. These principles establish the theoretical foundation for understanding how different approaches handle data consistency, failure scenarios, and performance requirements.

Consistency Models define the guarantees about what values readers observe after writes complete. Strong consistency ensures all readers see the same data at any point in time, requiring coordination between replicas. Eventual consistency allows temporary divergence between replicas, guaranteeing only that replicas will converge to the same state given enough time without new updates. Causal consistency preserves the order of causally related operations while allowing concurrent operations to be seen in different orders.

Replication Lag represents the time delay between when data is written to one replica and when it appears on other replicas. Synchronous replication minimizes lag by blocking writes until replicas acknowledge receipt, while asynchronous replication accepts lag in exchange for better write performance. The acceptable lag duration depends on application requirements and user expectations.

Quorum-Based Replication uses voting mechanisms to determine successful operations. A system with N replicas requires W acknowledgments for writes and R replicas consulted for reads. When W + R > N, the system guarantees strong consistency because read and write sets must overlap. Setting W = N and R = 1 optimizes for read performance with slower writes, while W = 1 and R = N optimizes for write performance with slower reads.

class QuorumReplication
  def initialize(replicas, write_quorum, read_quorum)
    @replicas = replicas
    @w = write_quorum
    @r = read_quorum
    validate_quorum
  end

  def write(key, value, version)
    responses = @replicas.map do |replica|
      Thread.new { replica.write(key, value, version) }
    end.map(&:value)

    successful = responses.count { |r| r[:status] == :success }
    successful >= @w
  end

  def read(key)
    responses = @replicas.sample(@r).map do |replica|
      Thread.new { replica.read(key) }
    end.map(&:value)

    # Return value with highest version number
    responses.max_by { |r| r[:version] }[:value]
  end

  private

  def validate_quorum
    n = @replicas.size
    raise "Invalid quorum" unless @w + @r > n
    raise "Write quorum too large" unless @w <= n
    raise "Read quorum too large" unless @r <= n
  end
end

Conflict Resolution becomes necessary when multiple replicas accept concurrent writes to the same data. Last-write-wins uses timestamps to resolve conflicts, but requires synchronized clocks across nodes. Vector clocks track causal relationships between updates, enabling detection of concurrent writes that require application-level resolution. CRDTs (Conflict-free Replicated Data Types) provide data structures with commutative operations that guarantee convergence without explicit conflict resolution.

Topology and Flow Control determine how updates propagate through the replication network. Chain replication passes writes through a sequence of nodes before acknowledging, providing strong consistency with good read throughput. Ring topologies distribute data and replication responsibility across nodes in a circle. Tree topologies create hierarchical replication with primary-secondary relationships at each level.

Failure Detection and Recovery mechanisms identify when replicas become unavailable and coordinate recovery processes. Heartbeat protocols exchange periodic messages to detect node failures. Consensus algorithms like Raft ensure the system maintains a consistent view of which nodes are active. When a failed node recovers, it must reconcile its state with current replicas through catch-up replication or snapshot-based recovery.

The principle of Durability Guarantees specifies when writes are considered permanent. Synchronous replication to persistent storage provides immediate durability. Write-ahead logs record updates before applying them, enabling recovery after crashes. Periodic checkpoints reduce recovery time by creating consistent snapshots.

Implementation Approaches

Different replication strategies serve distinct operational requirements and make different trade-offs between consistency, availability, and performance. Selecting an approach requires analyzing application needs, failure tolerance requirements, and acceptable complexity levels.

Master-Slave Replication designates one node as the master that handles all write operations, with one or more slave nodes receiving copies of the data. The master serializes all writes, providing a single consistent view. Slaves serve read requests, distributing query load across multiple nodes. This approach guarantees consistency because all updates flow through a single point, but creates a single point of failure for write availability.

class MasterSlaveReplication
  def initialize(master, slaves)
    @master = master
    @slaves = slaves
    @replication_lag = {}
  end

  def write(key, value)
    result = @master.write(key, value)
    
    # Asynchronous replication to slaves
    @slaves.each do |slave|
      Thread.new do
        begin
          slave.write(key, value)
          @replication_lag[slave.id] = Time.now - result[:timestamp]
        rescue => e
          handle_slave_failure(slave, e)
        end
      end
    end

    result
  end

  def read(key, consistency: :eventual)
    case consistency
    when :strong
      @master.read(key)
    when :eventual
      select_slave.read(key)
    end
  end

  private

  def select_slave
    # Round-robin or least-lag selection
    @slaves.min_by { |s| @replication_lag[s.id] || Float::INFINITY }
  end
end

Multi-Master Replication allows writes at any node, increasing write availability and reducing latency for geographically distributed users. Each master propagates its writes to other masters asynchronously. This strategy requires conflict resolution when concurrent writes modify the same data. Systems handle conflicts through last-write-wins policies, application callbacks, or CRDT data structures that guarantee convergence.

Synchronous vs Asynchronous Replication represents a fundamental trade-off. Synchronous replication blocks write operations until replicas acknowledge receipt, guaranteeing consistency but increasing latency and reducing availability when replicas are slow or unreachable. Asynchronous replication acknowledges writes immediately and replicates in the background, providing better performance and availability but risking data loss if the primary fails before replication completes.

class SynchronousReplication
  def write(key, value)
    transaction do
      primary_result = @primary.write(key, value)
      
      replica_results = @replicas.map do |replica|
        Thread.new { replica.write(key, value) }
      end.map(&:value)

      # All must succeed or rollback
      if replica_results.all? { |r| r[:status] == :success }
        commit
        primary_result
      else
        rollback
        raise ReplicationError, "Failed to replicate to all nodes"
      end
    end
  end
end

class AsynchronousReplication
  def write(key, value)
    result = @primary.write(key, value)
    
    # Queue replication without blocking
    @replication_queue.push(
      key: key,
      value: value,
      timestamp: result[:timestamp]
    )

    result
  end

  def replication_worker
    loop do
      job = @replication_queue.pop
      @replicas.each do |replica|
        retry_with_backoff { replica.write(job[:key], job[:value]) }
      end
    end
  end
end

Chain Replication organizes replicas in a sequence where writes propagate through the chain before acknowledging. The head receives writes and passes them down the chain, with the tail acknowledging completion. Reads come from the tail, which has all committed writes. This provides strong consistency with high read throughput, but writes experience cumulative latency from each hop in the chain.

Peer-to-Peer Replication distributes data across nodes without designated masters, using consistent hashing to determine which nodes store each key. Each data item replicates to multiple nodes. This approach provides high availability and scales horizontally, but requires sophisticated conflict resolution and anti-entropy mechanisms to handle network partitions and concurrent updates.

Statement-Based vs Row-Based Replication affects how updates propagate in database systems. Statement-based replication transmits SQL commands that slaves execute, reducing network bandwidth but risking divergence when statements have non-deterministic effects. Row-based replication transmits actual data changes, guaranteeing consistency but consuming more bandwidth. Modern systems often use a hybrid approach, selecting the most efficient method for each statement.

Design Considerations

Selecting a replication strategy requires evaluating application requirements against the characteristics and trade-offs of each approach. The decision impacts system behavior during normal operation and failure scenarios, affecting user experience and operational complexity.

Consistency Requirements drive fundamental strategy decisions. Applications requiring strong consistency, like financial systems where stale reads could cause incorrect transactions, need synchronous replication or quorum-based approaches. Applications tolerating eventual consistency, like social media feeds where slight delays in propagating updates are acceptable, benefit from asynchronous replication's performance advantages.

Write vs Read Patterns influence topology selection. Read-heavy applications benefit from master-slave configurations that scale read capacity across multiple slaves. Write-heavy applications with geographic distribution might need multi-master replication to avoid funneling all writes through a single location, despite the added conflict resolution complexity.

class AdaptiveReplication
  def initialize
    @strategy = :master_slave
    @write_rate = RateCounter.new
    @read_rate = RateCounter.new
  end

  def write(key, value)
    @write_rate.increment
    adapt_strategy
    
    case @strategy
    when :master_slave
      master_slave_write(key, value)
    when :multi_master
      multi_master_write(key, value)
    end
  end

  private

  def adapt_strategy
    read_write_ratio = @read_rate.current / @write_rate.current
    
    @strategy = if read_write_ratio > 10
                  :master_slave # Optimize for read scaling
                elsif @write_rate.current > THRESHOLD
                  :multi_master # Distribute write load
                end
  end
end

Geographic Distribution impacts latency and availability decisions. Systems serving global users experience significant cross-region latency. Synchronous replication across regions adds hundreds of milliseconds to write operations. Multi-master setups with asynchronous cross-region replication keep writes local, but require handling conflicts from concurrent updates in different regions.

Failure Tolerance goals determine replication factor and consistency level. Systems requiring high availability during failures need enough replicas that losing some still maintains quorum. Setting W + R > N guarantees consistency but means the system becomes unavailable when too many replicas fail. Setting lower quorum values maintains availability during partitions but risks inconsistent reads.

Operational Complexity varies significantly between strategies. Master-slave replication provides straightforward operation and failover procedures, though promoting a slave to master requires coordination. Multi-master configurations increase complexity through conflict resolution requirements and the need to handle split-brain scenarios where multiple nodes believe they're the master.

Data Loss Tolerance affects synchronous vs asynchronous decisions. Asynchronous replication risks losing recent writes if the primary fails before replication completes. Systems where data loss is unacceptable must use synchronous replication to durably commit writes to multiple nodes before acknowledging. The durability-performance trade-off depends on how much latency users will tolerate.

class ConfigurableReplication
  def initialize(config)
    @consistency = config[:consistency] # :strong, :eventual, :causal
    @durability = config[:durability]   # :none, :single, :quorum
    @availability = config[:availability] # :best_effort, :high
    @strategy = select_strategy(config)
  end

  private

  def select_strategy(config)
    if config[:consistency] == :strong && config[:availability] == :high
      raise ConfigurationError, "Cannot guarantee both strong consistency and high availability"
    end

    if config[:consistency] == :strong
      QuorumBasedReplication.new(
        write_quorum: majority(@replicas.size),
        read_quorum: majority(@replicas.size)
      )
    elsif config[:availability] == :high
      MultiMasterReplication.new(
        conflict_resolver: LastWriteWins.new
      )
    else
      MasterSlaveReplication.new(async: true)
    end
  end

  def majority(n)
    (n / 2) + 1
  end
end

Bandwidth and Storage Costs grow with replication factor and synchronization frequency. Each replica requires storage for the full dataset or its partition. Synchronization traffic consumes network bandwidth, with synchronous replication generating round-trip messages for each write. Compression and incremental updates reduce costs but add processing overhead.

Application-Level Impact depends on how the application interacts with the database. Applications issuing a write then immediately reading it back require read-your-writes consistency, which master-slave configurations provide naturally but eventually consistent systems must handle explicitly through session affinity or version tracking. Applications with complex transactions might require distributed transaction support across replicas.

Ruby Implementation

Ruby applications interact with replicated databases through client libraries that abstract replication topology. Different database systems expose replication features through varying APIs, but common patterns emerge for handling consistency levels, failover, and conflict resolution.

ActiveRecord with Read Replicas provides built-in support for directing reads to slave databases while sending writes to the master. Rails applications configure read replicas in database.yml and use automatic connection switching:

# config/database.yml
production:
  primary:
    adapter: postgresql
    host: primary-db.example.com
    database: app_production
  replica:
    adapter: postgresql
    host: replica-db.example.com
    database: app_production
    replica: true

# Automatic read replica routing
class User < ApplicationRecord
  # Reads go to replica by default
  def self.recently_active
    where("last_seen_at > ?", 1.hour.ago)
  end

  # Force primary for consistency
  def self.create_with_read(attributes)
    transaction do
      user = create(attributes)
      # Read from primary within transaction
      ActiveRecord::Base.connected_to(role: :writing) do
        User.find(user.id) # Ensures read-your-writes
      end
    end
  end
end

# Explicit connection control
ActiveRecord::Base.connected_to(role: :reading) do
  User.recently_active.each { |u| process(u) }
end

ActiveRecord::Base.connected_to(role: :writing) do
  User.create(name: "New User")
end

Redis Replication in Ruby uses the redis gem with sentinel support for automatic failover. Sentinel monitors the master and promotes a slave when the master fails:

require 'redis'

# Connect through Sentinel for automatic failover
redis = Redis.new(
  url: "redis://master-name",
  sentinels: [
    { host: "sentinel1.example.com", port: 26379 },
    { host: "sentinel2.example.com", port: 26379 },
    { host: "sentinel3.example.com", port: 26379 }
  ],
  role: :master,
  reconnect_attempts: 3
)

# Write to master
redis.set("user:123:name", "Alice")

# Read can hit replicas with read-replica option
redis_read = Redis.new(
  url: "redis://master-name",
  sentinels: [...],
  role: :slave,
  reconnect_attempts: 3
)

# Handling replication lag
def read_with_consistency(key, max_lag: 100)
  value = redis_read.get(key)
  lag = redis_read.info("replication")["master_repl_offset"] -
        redis_read.info("replication")["slave_repl_offset"]
  
  if lag > max_lag
    # Fall back to master for consistent read
    redis.get(key)
  else
    value
  end
end

MongoDB Replica Sets use the mongo gem with read preference controls for consistency-performance trade-offs:

require 'mongo'

client = Mongo::Client.new(
  ['mongodb1.example.com:27017', 
   'mongodb2.example.com:27017',
   'mongodb3.example.com:27017'],
  replica_set: 'myapp',
  database: 'production'
)

# Write concern controls replication acknowledgment
collection = client[:users]

# Wait for majority acknowledgment (synchronous replication)
collection.insert_one(
  { name: "Alice", email: "alice@example.com" },
  write_concern: { w: :majority, j: true, wtimeout: 5000 }
)

# Read from primary for strong consistency
users = collection.find({}, read: { mode: :primary })

# Read from nearest secondary for lower latency
recent_posts = client[:posts].find(
  { created_at: { '$gt': 1.hour.ago } },
  read: { mode: :nearest }
)

# Read from secondary with max staleness
analytics_data = client[:events].find(
  {},
  read: { 
    mode: :secondary_preferred,
    max_staleness: 120 # seconds
  }
)

Custom Replication Logic for application-level replication control handles scenarios where database-level replication doesn't meet requirements:

class MultiDatacenterReplication
  def initialize(primary_db, secondary_dbs)
    @primary = primary_db
    @secondaries = secondary_dbs
    @async_queue = Queue.new
    start_replication_worker
  end

  def write(table, record)
    # Synchronous write to primary
    result = @primary.insert(table, record)
    
    # Asynchronous replication to secondaries
    @secondaries.each do |dc, db|
      @async_queue.push(
        datacenter: dc,
        table: table,
        record: record,
        timestamp: result[:timestamp]
      )
    end

    result
  end

  def read(table, id, prefer_local: true)
    if prefer_local && local_datacenter
      db = @secondaries[local_datacenter]
      db.find(table, id)
    else
      @primary.find(table, id)
    end
  end

  private

  def start_replication_worker
    @worker = Thread.new do
      loop do
        job = @async_queue.pop
        replicate_with_retry(job)
      end
    end
  end

  def replicate_with_retry(job, max_attempts: 3)
    attempts = 0
    begin
      db = @secondaries[job[:datacenter]]
      db.insert(job[:table], job[:record])
    rescue => e
      attempts += 1
      if attempts < max_attempts
        sleep(2 ** attempts) # Exponential backoff
        retry
      else
        log_replication_failure(job, e)
        send_alert("Replication failed after #{attempts} attempts")
      end
    end
  end

  def local_datacenter
    # Determine datacenter from request context
    Thread.current[:datacenter]
  end
end

Conflict-Free Replicated Data Types implementation for eventual consistency with automatic conflict resolution:

class GCounter
  # Grow-only counter CRDT
  def initialize(replica_id)
    @replica_id = replica_id
    @counts = Hash.new(0)
  end

  def increment(amount = 1)
    @counts[@replica_id] += amount
  end

  def value
    @counts.values.sum
  end

  def merge(other)
    other.counts.each do |replica, count|
      @counts[replica] = [@counts[replica], count].max
    end
  end

  protected

  attr_reader :counts
end

class LWWRegister
  # Last-Writer-Wins register
  def initialize(value = nil)
    @value = value
    @timestamp = Time.now.to_f
    @replica_id = SecureRandom.uuid
  end

  def set(value)
    @value = value
    @timestamp = Time.now.to_f
  end

  def get
    @value
  end

  def merge(other)
    if other.timestamp > @timestamp ||
       (other.timestamp == @timestamp && other.replica_id > @replica_id)
      @value = other.value
      @timestamp = other.timestamp
      @replica_id = other.replica_id
    end
  end

  protected

  attr_reader :timestamp, :replica_id, :value
end

# ORSet - Observed-Remove Set
class ORSet
  def initialize
    @elements = {} # element => Set of unique tags
    @replica_id = SecureRandom.uuid
  end

  def add(element)
    tag = [@replica_id, Time.now.to_f, SecureRandom.hex(8)].join(':')
    @elements[element] ||= Set.new
    @elements[element] << tag
  end

  def remove(element)
    @elements.delete(element)
  end

  def include?(element)
    @elements.key?(element) && !@elements[element].empty?
  end

  def merge(other)
    all_elements = (@elements.keys + other.elements.keys).uniq
    
    all_elements.each do |element|
      our_tags = @elements[element] || Set.new
      their_tags = other.elements[element] || Set.new
      @elements[element] = our_tags | their_tags
    end
  end

  protected

  attr_reader :elements
end

Performance Considerations

Replication strategies directly impact system performance through write latency, read throughput, network utilization, and failure recovery time. Understanding these characteristics enables optimization for specific workload patterns.

Write Latency increases with replication requirements. Asynchronous replication adds minimal latency, typically microseconds to queue the replication request. Synchronous replication to a local replica adds milliseconds for the round-trip network time. Cross-region synchronous replication adds 50-300ms depending on geographic distance. Quorum-based writes with W=3 in a 5-node cluster wait for the slower of three nodes, increasing tail latencies.

require 'benchmark'

class ReplicationBenchmark
  def compare_strategies
    master_slave = MasterSlaveReplication.new(async: true)
    synchronous = SynchronousReplication.new(replicas: 3)
    quorum = QuorumReplication.new(replicas: 5, write_quorum: 3)

    results = {}
    
    results[:async] = Benchmark.measure do
      1000.times { |i| master_slave.write("key#{i}", "value#{i}") }
    end

    results[:sync] = Benchmark.measure do
      1000.times { |i| synchronous.write("key#{i}", "value#{i}") }
    end

    results[:quorum] = Benchmark.measure do
      1000.times { |i| quorum.write("key#{i}", "value#{i}") }
    end

    # Typical results:
    # async:   ~10ms for 1000 writes (0.01ms per write)
    # sync:    ~3000ms for 1000 writes (3ms per write)
    # quorum:  ~4500ms for 1000 writes (4.5ms per write)
    
    results
  end
end

Read Performance scales with replica count for eventually consistent reads. A master-slave setup with 5 slaves can serve 5x more read queries than a single node, assuming even load distribution. Strongly consistent reads must query the master or achieve quorum, limiting scalability. Read latency remains low for local replicas but increases for geographically distributed reads.

Network Bandwidth consumption depends on replication mode and data change rate. Asynchronous replication batches changes, reducing overhead. Synchronous replication generates immediate traffic for each write. In a system with 1000 writes/second at 1KB each, replication to 3 nodes generates 3MB/second outbound and 3MB/second inbound traffic at each node, totaling 9MB/second system-wide.

Replication Lag affects read consistency and failover safety. Monitoring lag metrics helps identify performance bottlenecks and data loss risk. Lag increases under heavy write load, slow network conditions, or when replicas fall behind on processing:

class ReplicationMonitor
  def initialize(primary, replicas)
    @primary = primary
    @replicas = replicas
  end

  def check_lag
    primary_position = @primary.replication_position
    
    @replicas.map do |replica|
      begin
        replica_position = replica.replication_position
        lag = primary_position - replica_position
        
        {
          replica: replica.id,
          lag_bytes: lag,
          lag_seconds: estimate_time_lag(lag),
          status: lag < 1.megabyte ? :healthy : :lagging
        }
      rescue => e
        {
          replica: replica.id,
          status: :unreachable,
          error: e.message
        }
      end
    end
  end

  private

  def estimate_time_lag(byte_lag)
    # Estimate based on recent write rate
    recent_rate = @primary.bytes_written_per_second
    byte_lag / recent_rate
  end
end

Write Amplification occurs in multi-master setups where each node propagates writes to all other nodes. With N nodes, each write generates N-1 replication messages. A 10-node cluster produces 90 messages for every 10 original writes. This limits practical cluster sizes and benefits from hierarchical replication topologies.

Conflict Resolution Overhead impacts multi-master performance. Last-write-wins resolution adds minimal overhead, just comparing timestamps. Vector clock comparison requires examining multiple version vectors. Application-level merge functions can be expensive, particularly for complex data structures. CRDT operations maintain constant-time merge complexity but carry metadata overhead.

Cache Coherency between replicas affects read performance. Invalidation-based caching across replicas requires coordination messages on each write. Read-through caches at each replica reduce query load but risk serving stale data. Cache TTLs balance consistency and performance, with shorter TTLs increasing database load.

Bulk Operations benefit from batch replication. Instead of replicating each write individually, batching 100 writes into a single replication message reduces overhead by 99x. However, batching increases replication lag and data loss risk:

class BatchReplication
  def initialize(replicas, batch_size: 100, batch_timeout: 1.0)
    @replicas = replicas
    @batch_size = batch_size
    @batch_timeout = batch_timeout
    @batch = []
    @last_flush = Time.now
    @mutex = Mutex.new
  end

  def write(key, value)
    @mutex.synchronize do
      @batch << { key: key, value: value, timestamp: Time.now }
      
      if should_flush?
        flush_batch
      end
    end
  end

  private

  def should_flush?
    @batch.size >= @batch_size ||
    (Time.now - @last_flush) >= @batch_timeout
  end

  def flush_batch
    return if @batch.empty?
    
    batch_copy = @batch.dup
    @batch.clear
    @last_flush = Time.now

    @replicas.each do |replica|
      Thread.new { replica.write_batch(batch_copy) }
    end
  end
end

Connection Pooling for replicas requires careful sizing. Each application server maintains connections to primary and replica databases. A pool too small causes connection contention and increased latency. A pool too large exhausts database connection limits. Typical configurations use 5-20 connections per application instance, with replicas supporting more read connections than the primary supports write connections.

Real-World Applications

Production deployments combine multiple replication strategies to meet different requirements within the same system. Understanding how real systems apply replication patterns illuminates practical trade-offs and operational considerations.

Global E-Commerce Platforms deploy multi-region replication to serve users worldwide with low latency. Product catalogs replicate asynchronously across regions since slight staleness in product listings is acceptable. Inventory counts use synchronous replication within a region for consistency, preventing overselling. Order processing writes to the local region's primary database, which replicates to other regions for disaster recovery:

class EcommerceReplication
  def initialize
    @product_catalog = AsyncMultiRegionReplication.new
    @inventory = SyncRegionalReplication.new
    @orders = LocalPrimaryReplication.new(cross_region_backup: true)
  end

  def create_order(user_id, items)
    # Check inventory with local strong consistency
    @inventory.transaction do
      items.each do |item|
        available = @inventory.read(item[:sku], consistency: :strong)
        raise OutOfStock unless available >= item[:quantity]
        
        @inventory.decrement(item[:sku], item[:quantity])
      end

      # Record order in local region
      order = @orders.write(
        user_id: user_id,
        items: items,
        timestamp: Time.now
      )

      # Asynchronously replicate to other regions
      @orders.replicate_async(order, regions: :all)
      
      order
    end
  end

  def get_product(sku)
    # Read from local replica for low latency
    @product_catalog.read(
      sku,
      region: current_region,
      max_staleness: 5.minutes
    )
  end
end

Social Media Applications use eventual consistency for most data, accepting temporary inconsistencies in exchange for performance and availability. Posts replicate asynchronously across data centers. Like counts use CRDTs to merge concurrent updates. User authentication requires strong consistency for security:

class SocialMediaReplication
  def create_post(user_id, content)
    # Write to local data center
    post = @posts_db.insert(
      user_id: user_id,
      content: content,
      timestamp: Time.now.to_f
    )

    # Replicate to other data centers async
    propagate_to_regions(post)
    
    # Update follower feeds asynchronously
    enqueue_feed_fanout(user_id, post[:id])

    post
  end

  def increment_likes(post_id, user_id)
    # CRDT counter for conflict-free merge
    @like_counter.increment(post_id, replica_id: current_datacenter)
    
    # Propagate to other data centers
    propagate_counter_state(post_id)
  end

  def authenticate(username, password)
    # Read from master for strong consistency
    user = @auth_db.find(
      username,
      consistency: :strong,
      encryption: true
    )
    
    verify_password(user, password)
  end
end

Financial Systems prioritize consistency and durability over performance. Transaction processing uses synchronous replication to multiple nodes before acknowledging, ensuring transactions survive failures. Audit logs replicate synchronously and immutably. Reporting databases receive asynchronous replication for read-only analytics:

class FinancialTransactionReplication
  def process_transaction(from_account, to_account, amount)
    # Synchronous replication to 3 nodes with durable storage
    @transaction_db.transaction(
      write_concern: { w: 3, j: true, wtimeout: 5000 }
    ) do
      debit = debit_account(from_account, amount)
      credit = credit_account(to_account, amount)
      
      audit_log = log_transaction(
        type: :transfer,
        from: from_account,
        to: to_account,
        amount: amount,
        timestamp: Time.now
      )

      # Synchronously replicate audit log
      @audit_log.write(
        audit_log,
        replicas: :all_sync,
        durable: true
      )

      # Asynchronously replicate to analytics DB
      @analytics_replication.queue_write(audit_log)
    end
  end

  def query_balance(account)
    # Always read from primary for accurate balance
    @transaction_db.find(
      account,
      consistency: :strong,
      isolation_level: :serializable
    )
  end
end

Content Delivery Networks use hierarchical replication to distribute content globally. Origin servers hold the canonical content. Regional edge servers replicate frequently accessed content. Edge servers use TTL-based caching with on-demand fetching for cache misses:

class CDNReplication
  def initialize
    @origin = OriginServer.new
    @edge_servers = EdgeServerPool.new
    @replication_policy = ReplicationPolicy.new
  end

  def get_content(path, edge_location)
    edge_server = @edge_servers.closest_to(edge_location)
    
    # Try edge server first
    content = edge_server.read(path)
    
    if content && !content.expired?
      return content
    end

    # Cache miss or expired - fetch from origin
    content = @origin.read(path)
    
    # Populate edge cache
    edge_server.write(
      path,
      content,
      ttl: @replication_policy.ttl_for(path)
    )

    # Prefetch to nearby edges if popular
    if @replication_policy.should_prefetch?(path)
      prefetch_to_region(path, edge_location)
    end

    content
  end

  private

  def prefetch_to_region(path, location)
    nearby_edges = @edge_servers.in_region(location)
    
    Thread.new do
      content = @origin.read(path)
      nearby_edges.each do |edge|
        edge.write(path, content, ttl: 1.hour)
      end
    end
  end
end

Time-Series Databases for monitoring and metrics use partitioned replication. Recent data replicates to all nodes for real-time alerting. Historical data replicates to fewer nodes for archival. Downsampling reduces resolution of old data to save storage:

class TimeSeriesReplication
  def write_metric(metric_name, value, timestamp = Time.now)
    partition = time_partition(timestamp)
    
    case partition
    when :recent # Last 24 hours
      @timeseries_db.write(
        metric_name,
        value,
        timestamp,
        replication: :all_nodes,
        resolution: :full
      )
    when :daily # Last 30 days
      @timeseries_db.write(
        metric_name,
        downsample(value),
        timestamp,
        replication: :some_nodes,
        resolution: :minute
      )
    when :archive # Older than 30 days
      @timeseries_db.write(
        metric_name,
        downsample(value, factor: 60),
        timestamp,
        replication: :archive_nodes,
        resolution: :hour
      )
    end
  end

  private

  def time_partition(timestamp)
    age = Time.now - timestamp
    
    if age < 24.hours
      :recent
    elsif age < 30.days
      :daily
    else
      :archive
    end
  end
end

Reference

Replication Strategy Comparison

Strategy Write Latency Read Scalability Consistency Complexity
Master-Slave Async Low High Eventual Low
Master-Slave Sync High High Strong Medium
Multi-Master Low High Eventual High
Quorum-Based Medium Medium Tunable Medium
Chain Replication High High Strong Medium
Peer-to-Peer Low Very High Eventual High

Consistency Models

Model Guarantees Use Cases Trade-offs
Strong All reads see latest write Financial transactions, inventory High latency, reduced availability
Eventual Replicas converge over time Social feeds, catalogs Temporary inconsistency
Causal Preserves cause-effect order Collaborative editing Complex implementation
Read-Your-Writes Readers see their own writes User profiles, settings Session stickiness required
Monotonic Reads Reads never go backward in time Analytics, monitoring May see stale data

Quorum Configurations

Configuration Read Performance Write Performance Consistency
W=N, R=1 Fastest reads Slowest writes Strong consistency
W=1, R=N Slowest reads Fastest writes Strong consistency
W=majority, R=majority Balanced Balanced Strong consistency
W=1, R=1 Fastest reads Fastest writes Eventual consistency

Conflict Resolution Strategies

Strategy Mechanism Convergence Data Loss Risk
Last-Write-Wins Timestamp comparison Guaranteed Updates may be lost
Vector Clocks Causal ordering Guaranteed Requires manual merge
CRDTs Commutative operations Guaranteed None
Application Merge Custom logic Depends Depends on logic

Replication Lag Metrics

Metric Description Healthy Range Action Threshold
Bytes Behind Data volume not yet replicated < 1 MB > 10 MB
Time Behind Estimated time lag < 1 second > 10 seconds
Transactions Behind Number of operations pending < 100 > 1000
Apply Rate Replication speed Near write rate < 50% of write rate

ActiveRecord Connection Roles

Role Purpose Connection Target Usage
writing Write operations Primary database Create, update, delete
reading Read operations Replica database Select queries
replica Explicit replica reads Named replica Specific replica selection

Redis Replication Commands

Command Purpose Example
REPLICAOF Configure replication REPLICAOF hostname port
INFO replication Check replication status INFO replication
ROLE Get node role ROLE
WAIT Wait for replication WAIT numreplicas timeout

MongoDB Write Concerns

Write Concern Acknowledgment Durability Use Case
w: 1 Single node Low High throughput
w: majority Quorum High Consistency
j: true Journal sync Highest Critical data
wtimeout: 5000 Timeout 5s Varies Prevent blocking

Performance Optimization Techniques

Technique Benefit Implementation Trade-off
Connection Pooling Reduce connection overhead Maintain open connections Memory usage
Batch Replication Lower network overhead Buffer then send Increased lag
Compression Reduce bandwidth Compress replication stream CPU usage
Parallel Apply Faster replication Multi-threaded apply Complexity
Read-Local Lower read latency Route to nearest replica Staleness