Overview
Replication strategies define how data copies are created, maintained, and synchronized across multiple nodes in distributed systems. These strategies determine the consistency guarantees, availability characteristics, and performance trade-offs of a system. The choice of replication strategy directly impacts application behavior, scalability limits, and operational complexity.
Database replication addresses several critical requirements: fault tolerance through redundancy, improved read performance through load distribution, geographic data locality, and disaster recovery capabilities. Each replication strategy represents a different position on the consistency-availability-partition tolerance spectrum defined by the CAP theorem.
The fundamental challenge in replication involves maintaining data consistency across nodes while managing network latency, potential failures, and concurrent updates. Different strategies make different trade-offs between these concerns. A master-slave setup prioritizes consistency by channeling all writes through a single node, while multi-master configurations prioritize availability by accepting writes at any node.
# Basic replication setup showing different strategies
class ReplicationManager
attr_reader :strategy, :nodes
def initialize(strategy:, nodes:)
@strategy = strategy
@nodes = nodes
@primary = nodes.first
end
def write(key, value)
case strategy
when :master_slave
@primary.write(key, value)
replicate_async(key, value)
when :multi_master
current_node.write(key, value)
broadcast_write(key, value)
end
end
end
Modern distributed systems often combine multiple replication strategies. A system might use synchronous replication within a data center for strong consistency while employing asynchronous replication across geographic regions for disaster recovery. Understanding the implications of each strategy enables informed architectural decisions.
Key Principles
Replication strategies operate on several fundamental principles that govern their behavior and characteristics. These principles establish the theoretical foundation for understanding how different approaches handle data consistency, failure scenarios, and performance requirements.
Consistency Models define the guarantees about what values readers observe after writes complete. Strong consistency ensures all readers see the same data at any point in time, requiring coordination between replicas. Eventual consistency allows temporary divergence between replicas, guaranteeing only that replicas will converge to the same state given enough time without new updates. Causal consistency preserves the order of causally related operations while allowing concurrent operations to be seen in different orders.
Replication Lag represents the time delay between when data is written to one replica and when it appears on other replicas. Synchronous replication minimizes lag by blocking writes until replicas acknowledge receipt, while asynchronous replication accepts lag in exchange for better write performance. The acceptable lag duration depends on application requirements and user expectations.
Quorum-Based Replication uses voting mechanisms to determine successful operations. A system with N replicas requires W acknowledgments for writes and R replicas consulted for reads. When W + R > N, the system guarantees strong consistency because read and write sets must overlap. Setting W = N and R = 1 optimizes for read performance with slower writes, while W = 1 and R = N optimizes for write performance with slower reads.
class QuorumReplication
def initialize(replicas, write_quorum, read_quorum)
@replicas = replicas
@w = write_quorum
@r = read_quorum
validate_quorum
end
def write(key, value, version)
responses = @replicas.map do |replica|
Thread.new { replica.write(key, value, version) }
end.map(&:value)
successful = responses.count { |r| r[:status] == :success }
successful >= @w
end
def read(key)
responses = @replicas.sample(@r).map do |replica|
Thread.new { replica.read(key) }
end.map(&:value)
# Return value with highest version number
responses.max_by { |r| r[:version] }[:value]
end
private
def validate_quorum
n = @replicas.size
raise "Invalid quorum" unless @w + @r > n
raise "Write quorum too large" unless @w <= n
raise "Read quorum too large" unless @r <= n
end
end
Conflict Resolution becomes necessary when multiple replicas accept concurrent writes to the same data. Last-write-wins uses timestamps to resolve conflicts, but requires synchronized clocks across nodes. Vector clocks track causal relationships between updates, enabling detection of concurrent writes that require application-level resolution. CRDTs (Conflict-free Replicated Data Types) provide data structures with commutative operations that guarantee convergence without explicit conflict resolution.
Topology and Flow Control determine how updates propagate through the replication network. Chain replication passes writes through a sequence of nodes before acknowledging, providing strong consistency with good read throughput. Ring topologies distribute data and replication responsibility across nodes in a circle. Tree topologies create hierarchical replication with primary-secondary relationships at each level.
Failure Detection and Recovery mechanisms identify when replicas become unavailable and coordinate recovery processes. Heartbeat protocols exchange periodic messages to detect node failures. Consensus algorithms like Raft ensure the system maintains a consistent view of which nodes are active. When a failed node recovers, it must reconcile its state with current replicas through catch-up replication or snapshot-based recovery.
The principle of Durability Guarantees specifies when writes are considered permanent. Synchronous replication to persistent storage provides immediate durability. Write-ahead logs record updates before applying them, enabling recovery after crashes. Periodic checkpoints reduce recovery time by creating consistent snapshots.
Implementation Approaches
Different replication strategies serve distinct operational requirements and make different trade-offs between consistency, availability, and performance. Selecting an approach requires analyzing application needs, failure tolerance requirements, and acceptable complexity levels.
Master-Slave Replication designates one node as the master that handles all write operations, with one or more slave nodes receiving copies of the data. The master serializes all writes, providing a single consistent view. Slaves serve read requests, distributing query load across multiple nodes. This approach guarantees consistency because all updates flow through a single point, but creates a single point of failure for write availability.
class MasterSlaveReplication
def initialize(master, slaves)
@master = master
@slaves = slaves
@replication_lag = {}
end
def write(key, value)
result = @master.write(key, value)
# Asynchronous replication to slaves
@slaves.each do |slave|
Thread.new do
begin
slave.write(key, value)
@replication_lag[slave.id] = Time.now - result[:timestamp]
rescue => e
handle_slave_failure(slave, e)
end
end
end
result
end
def read(key, consistency: :eventual)
case consistency
when :strong
@master.read(key)
when :eventual
select_slave.read(key)
end
end
private
def select_slave
# Round-robin or least-lag selection
@slaves.min_by { |s| @replication_lag[s.id] || Float::INFINITY }
end
end
Multi-Master Replication allows writes at any node, increasing write availability and reducing latency for geographically distributed users. Each master propagates its writes to other masters asynchronously. This strategy requires conflict resolution when concurrent writes modify the same data. Systems handle conflicts through last-write-wins policies, application callbacks, or CRDT data structures that guarantee convergence.
Synchronous vs Asynchronous Replication represents a fundamental trade-off. Synchronous replication blocks write operations until replicas acknowledge receipt, guaranteeing consistency but increasing latency and reducing availability when replicas are slow or unreachable. Asynchronous replication acknowledges writes immediately and replicates in the background, providing better performance and availability but risking data loss if the primary fails before replication completes.
class SynchronousReplication
def write(key, value)
transaction do
primary_result = @primary.write(key, value)
replica_results = @replicas.map do |replica|
Thread.new { replica.write(key, value) }
end.map(&:value)
# All must succeed or rollback
if replica_results.all? { |r| r[:status] == :success }
commit
primary_result
else
rollback
raise ReplicationError, "Failed to replicate to all nodes"
end
end
end
end
class AsynchronousReplication
def write(key, value)
result = @primary.write(key, value)
# Queue replication without blocking
@replication_queue.push(
key: key,
value: value,
timestamp: result[:timestamp]
)
result
end
def replication_worker
loop do
job = @replication_queue.pop
@replicas.each do |replica|
retry_with_backoff { replica.write(job[:key], job[:value]) }
end
end
end
end
Chain Replication organizes replicas in a sequence where writes propagate through the chain before acknowledging. The head receives writes and passes them down the chain, with the tail acknowledging completion. Reads come from the tail, which has all committed writes. This provides strong consistency with high read throughput, but writes experience cumulative latency from each hop in the chain.
Peer-to-Peer Replication distributes data across nodes without designated masters, using consistent hashing to determine which nodes store each key. Each data item replicates to multiple nodes. This approach provides high availability and scales horizontally, but requires sophisticated conflict resolution and anti-entropy mechanisms to handle network partitions and concurrent updates.
Statement-Based vs Row-Based Replication affects how updates propagate in database systems. Statement-based replication transmits SQL commands that slaves execute, reducing network bandwidth but risking divergence when statements have non-deterministic effects. Row-based replication transmits actual data changes, guaranteeing consistency but consuming more bandwidth. Modern systems often use a hybrid approach, selecting the most efficient method for each statement.
Design Considerations
Selecting a replication strategy requires evaluating application requirements against the characteristics and trade-offs of each approach. The decision impacts system behavior during normal operation and failure scenarios, affecting user experience and operational complexity.
Consistency Requirements drive fundamental strategy decisions. Applications requiring strong consistency, like financial systems where stale reads could cause incorrect transactions, need synchronous replication or quorum-based approaches. Applications tolerating eventual consistency, like social media feeds where slight delays in propagating updates are acceptable, benefit from asynchronous replication's performance advantages.
Write vs Read Patterns influence topology selection. Read-heavy applications benefit from master-slave configurations that scale read capacity across multiple slaves. Write-heavy applications with geographic distribution might need multi-master replication to avoid funneling all writes through a single location, despite the added conflict resolution complexity.
class AdaptiveReplication
def initialize
@strategy = :master_slave
@write_rate = RateCounter.new
@read_rate = RateCounter.new
end
def write(key, value)
@write_rate.increment
adapt_strategy
case @strategy
when :master_slave
master_slave_write(key, value)
when :multi_master
multi_master_write(key, value)
end
end
private
def adapt_strategy
read_write_ratio = @read_rate.current / @write_rate.current
@strategy = if read_write_ratio > 10
:master_slave # Optimize for read scaling
elsif @write_rate.current > THRESHOLD
:multi_master # Distribute write load
end
end
end
Geographic Distribution impacts latency and availability decisions. Systems serving global users experience significant cross-region latency. Synchronous replication across regions adds hundreds of milliseconds to write operations. Multi-master setups with asynchronous cross-region replication keep writes local, but require handling conflicts from concurrent updates in different regions.
Failure Tolerance goals determine replication factor and consistency level. Systems requiring high availability during failures need enough replicas that losing some still maintains quorum. Setting W + R > N guarantees consistency but means the system becomes unavailable when too many replicas fail. Setting lower quorum values maintains availability during partitions but risks inconsistent reads.
Operational Complexity varies significantly between strategies. Master-slave replication provides straightforward operation and failover procedures, though promoting a slave to master requires coordination. Multi-master configurations increase complexity through conflict resolution requirements and the need to handle split-brain scenarios where multiple nodes believe they're the master.
Data Loss Tolerance affects synchronous vs asynchronous decisions. Asynchronous replication risks losing recent writes if the primary fails before replication completes. Systems where data loss is unacceptable must use synchronous replication to durably commit writes to multiple nodes before acknowledging. The durability-performance trade-off depends on how much latency users will tolerate.
class ConfigurableReplication
def initialize(config)
@consistency = config[:consistency] # :strong, :eventual, :causal
@durability = config[:durability] # :none, :single, :quorum
@availability = config[:availability] # :best_effort, :high
@strategy = select_strategy(config)
end
private
def select_strategy(config)
if config[:consistency] == :strong && config[:availability] == :high
raise ConfigurationError, "Cannot guarantee both strong consistency and high availability"
end
if config[:consistency] == :strong
QuorumBasedReplication.new(
write_quorum: majority(@replicas.size),
read_quorum: majority(@replicas.size)
)
elsif config[:availability] == :high
MultiMasterReplication.new(
conflict_resolver: LastWriteWins.new
)
else
MasterSlaveReplication.new(async: true)
end
end
def majority(n)
(n / 2) + 1
end
end
Bandwidth and Storage Costs grow with replication factor and synchronization frequency. Each replica requires storage for the full dataset or its partition. Synchronization traffic consumes network bandwidth, with synchronous replication generating round-trip messages for each write. Compression and incremental updates reduce costs but add processing overhead.
Application-Level Impact depends on how the application interacts with the database. Applications issuing a write then immediately reading it back require read-your-writes consistency, which master-slave configurations provide naturally but eventually consistent systems must handle explicitly through session affinity or version tracking. Applications with complex transactions might require distributed transaction support across replicas.
Ruby Implementation
Ruby applications interact with replicated databases through client libraries that abstract replication topology. Different database systems expose replication features through varying APIs, but common patterns emerge for handling consistency levels, failover, and conflict resolution.
ActiveRecord with Read Replicas provides built-in support for directing reads to slave databases while sending writes to the master. Rails applications configure read replicas in database.yml and use automatic connection switching:
# config/database.yml
production:
primary:
adapter: postgresql
host: primary-db.example.com
database: app_production
replica:
adapter: postgresql
host: replica-db.example.com
database: app_production
replica: true
# Automatic read replica routing
class User < ApplicationRecord
# Reads go to replica by default
def self.recently_active
where("last_seen_at > ?", 1.hour.ago)
end
# Force primary for consistency
def self.create_with_read(attributes)
transaction do
user = create(attributes)
# Read from primary within transaction
ActiveRecord::Base.connected_to(role: :writing) do
User.find(user.id) # Ensures read-your-writes
end
end
end
end
# Explicit connection control
ActiveRecord::Base.connected_to(role: :reading) do
User.recently_active.each { |u| process(u) }
end
ActiveRecord::Base.connected_to(role: :writing) do
User.create(name: "New User")
end
Redis Replication in Ruby uses the redis gem with sentinel support for automatic failover. Sentinel monitors the master and promotes a slave when the master fails:
require 'redis'
# Connect through Sentinel for automatic failover
redis = Redis.new(
url: "redis://master-name",
sentinels: [
{ host: "sentinel1.example.com", port: 26379 },
{ host: "sentinel2.example.com", port: 26379 },
{ host: "sentinel3.example.com", port: 26379 }
],
role: :master,
reconnect_attempts: 3
)
# Write to master
redis.set("user:123:name", "Alice")
# Read can hit replicas with read-replica option
redis_read = Redis.new(
url: "redis://master-name",
sentinels: [...],
role: :slave,
reconnect_attempts: 3
)
# Handling replication lag
def read_with_consistency(key, max_lag: 100)
value = redis_read.get(key)
lag = redis_read.info("replication")["master_repl_offset"] -
redis_read.info("replication")["slave_repl_offset"]
if lag > max_lag
# Fall back to master for consistent read
redis.get(key)
else
value
end
end
MongoDB Replica Sets use the mongo gem with read preference controls for consistency-performance trade-offs:
require 'mongo'
client = Mongo::Client.new(
['mongodb1.example.com:27017',
'mongodb2.example.com:27017',
'mongodb3.example.com:27017'],
replica_set: 'myapp',
database: 'production'
)
# Write concern controls replication acknowledgment
collection = client[:users]
# Wait for majority acknowledgment (synchronous replication)
collection.insert_one(
{ name: "Alice", email: "alice@example.com" },
write_concern: { w: :majority, j: true, wtimeout: 5000 }
)
# Read from primary for strong consistency
users = collection.find({}, read: { mode: :primary })
# Read from nearest secondary for lower latency
recent_posts = client[:posts].find(
{ created_at: { '$gt': 1.hour.ago } },
read: { mode: :nearest }
)
# Read from secondary with max staleness
analytics_data = client[:events].find(
{},
read: {
mode: :secondary_preferred,
max_staleness: 120 # seconds
}
)
Custom Replication Logic for application-level replication control handles scenarios where database-level replication doesn't meet requirements:
class MultiDatacenterReplication
def initialize(primary_db, secondary_dbs)
@primary = primary_db
@secondaries = secondary_dbs
@async_queue = Queue.new
start_replication_worker
end
def write(table, record)
# Synchronous write to primary
result = @primary.insert(table, record)
# Asynchronous replication to secondaries
@secondaries.each do |dc, db|
@async_queue.push(
datacenter: dc,
table: table,
record: record,
timestamp: result[:timestamp]
)
end
result
end
def read(table, id, prefer_local: true)
if prefer_local && local_datacenter
db = @secondaries[local_datacenter]
db.find(table, id)
else
@primary.find(table, id)
end
end
private
def start_replication_worker
@worker = Thread.new do
loop do
job = @async_queue.pop
replicate_with_retry(job)
end
end
end
def replicate_with_retry(job, max_attempts: 3)
attempts = 0
begin
db = @secondaries[job[:datacenter]]
db.insert(job[:table], job[:record])
rescue => e
attempts += 1
if attempts < max_attempts
sleep(2 ** attempts) # Exponential backoff
retry
else
log_replication_failure(job, e)
send_alert("Replication failed after #{attempts} attempts")
end
end
end
def local_datacenter
# Determine datacenter from request context
Thread.current[:datacenter]
end
end
Conflict-Free Replicated Data Types implementation for eventual consistency with automatic conflict resolution:
class GCounter
# Grow-only counter CRDT
def initialize(replica_id)
@replica_id = replica_id
@counts = Hash.new(0)
end
def increment(amount = 1)
@counts[@replica_id] += amount
end
def value
@counts.values.sum
end
def merge(other)
other.counts.each do |replica, count|
@counts[replica] = [@counts[replica], count].max
end
end
protected
attr_reader :counts
end
class LWWRegister
# Last-Writer-Wins register
def initialize(value = nil)
@value = value
@timestamp = Time.now.to_f
@replica_id = SecureRandom.uuid
end
def set(value)
@value = value
@timestamp = Time.now.to_f
end
def get
@value
end
def merge(other)
if other.timestamp > @timestamp ||
(other.timestamp == @timestamp && other.replica_id > @replica_id)
@value = other.value
@timestamp = other.timestamp
@replica_id = other.replica_id
end
end
protected
attr_reader :timestamp, :replica_id, :value
end
# ORSet - Observed-Remove Set
class ORSet
def initialize
@elements = {} # element => Set of unique tags
@replica_id = SecureRandom.uuid
end
def add(element)
tag = [@replica_id, Time.now.to_f, SecureRandom.hex(8)].join(':')
@elements[element] ||= Set.new
@elements[element] << tag
end
def remove(element)
@elements.delete(element)
end
def include?(element)
@elements.key?(element) && !@elements[element].empty?
end
def merge(other)
all_elements = (@elements.keys + other.elements.keys).uniq
all_elements.each do |element|
our_tags = @elements[element] || Set.new
their_tags = other.elements[element] || Set.new
@elements[element] = our_tags | their_tags
end
end
protected
attr_reader :elements
end
Performance Considerations
Replication strategies directly impact system performance through write latency, read throughput, network utilization, and failure recovery time. Understanding these characteristics enables optimization for specific workload patterns.
Write Latency increases with replication requirements. Asynchronous replication adds minimal latency, typically microseconds to queue the replication request. Synchronous replication to a local replica adds milliseconds for the round-trip network time. Cross-region synchronous replication adds 50-300ms depending on geographic distance. Quorum-based writes with W=3 in a 5-node cluster wait for the slower of three nodes, increasing tail latencies.
require 'benchmark'
class ReplicationBenchmark
def compare_strategies
master_slave = MasterSlaveReplication.new(async: true)
synchronous = SynchronousReplication.new(replicas: 3)
quorum = QuorumReplication.new(replicas: 5, write_quorum: 3)
results = {}
results[:async] = Benchmark.measure do
1000.times { |i| master_slave.write("key#{i}", "value#{i}") }
end
results[:sync] = Benchmark.measure do
1000.times { |i| synchronous.write("key#{i}", "value#{i}") }
end
results[:quorum] = Benchmark.measure do
1000.times { |i| quorum.write("key#{i}", "value#{i}") }
end
# Typical results:
# async: ~10ms for 1000 writes (0.01ms per write)
# sync: ~3000ms for 1000 writes (3ms per write)
# quorum: ~4500ms for 1000 writes (4.5ms per write)
results
end
end
Read Performance scales with replica count for eventually consistent reads. A master-slave setup with 5 slaves can serve 5x more read queries than a single node, assuming even load distribution. Strongly consistent reads must query the master or achieve quorum, limiting scalability. Read latency remains low for local replicas but increases for geographically distributed reads.
Network Bandwidth consumption depends on replication mode and data change rate. Asynchronous replication batches changes, reducing overhead. Synchronous replication generates immediate traffic for each write. In a system with 1000 writes/second at 1KB each, replication to 3 nodes generates 3MB/second outbound and 3MB/second inbound traffic at each node, totaling 9MB/second system-wide.
Replication Lag affects read consistency and failover safety. Monitoring lag metrics helps identify performance bottlenecks and data loss risk. Lag increases under heavy write load, slow network conditions, or when replicas fall behind on processing:
class ReplicationMonitor
def initialize(primary, replicas)
@primary = primary
@replicas = replicas
end
def check_lag
primary_position = @primary.replication_position
@replicas.map do |replica|
begin
replica_position = replica.replication_position
lag = primary_position - replica_position
{
replica: replica.id,
lag_bytes: lag,
lag_seconds: estimate_time_lag(lag),
status: lag < 1.megabyte ? :healthy : :lagging
}
rescue => e
{
replica: replica.id,
status: :unreachable,
error: e.message
}
end
end
end
private
def estimate_time_lag(byte_lag)
# Estimate based on recent write rate
recent_rate = @primary.bytes_written_per_second
byte_lag / recent_rate
end
end
Write Amplification occurs in multi-master setups where each node propagates writes to all other nodes. With N nodes, each write generates N-1 replication messages. A 10-node cluster produces 90 messages for every 10 original writes. This limits practical cluster sizes and benefits from hierarchical replication topologies.
Conflict Resolution Overhead impacts multi-master performance. Last-write-wins resolution adds minimal overhead, just comparing timestamps. Vector clock comparison requires examining multiple version vectors. Application-level merge functions can be expensive, particularly for complex data structures. CRDT operations maintain constant-time merge complexity but carry metadata overhead.
Cache Coherency between replicas affects read performance. Invalidation-based caching across replicas requires coordination messages on each write. Read-through caches at each replica reduce query load but risk serving stale data. Cache TTLs balance consistency and performance, with shorter TTLs increasing database load.
Bulk Operations benefit from batch replication. Instead of replicating each write individually, batching 100 writes into a single replication message reduces overhead by 99x. However, batching increases replication lag and data loss risk:
class BatchReplication
def initialize(replicas, batch_size: 100, batch_timeout: 1.0)
@replicas = replicas
@batch_size = batch_size
@batch_timeout = batch_timeout
@batch = []
@last_flush = Time.now
@mutex = Mutex.new
end
def write(key, value)
@mutex.synchronize do
@batch << { key: key, value: value, timestamp: Time.now }
if should_flush?
flush_batch
end
end
end
private
def should_flush?
@batch.size >= @batch_size ||
(Time.now - @last_flush) >= @batch_timeout
end
def flush_batch
return if @batch.empty?
batch_copy = @batch.dup
@batch.clear
@last_flush = Time.now
@replicas.each do |replica|
Thread.new { replica.write_batch(batch_copy) }
end
end
end
Connection Pooling for replicas requires careful sizing. Each application server maintains connections to primary and replica databases. A pool too small causes connection contention and increased latency. A pool too large exhausts database connection limits. Typical configurations use 5-20 connections per application instance, with replicas supporting more read connections than the primary supports write connections.
Real-World Applications
Production deployments combine multiple replication strategies to meet different requirements within the same system. Understanding how real systems apply replication patterns illuminates practical trade-offs and operational considerations.
Global E-Commerce Platforms deploy multi-region replication to serve users worldwide with low latency. Product catalogs replicate asynchronously across regions since slight staleness in product listings is acceptable. Inventory counts use synchronous replication within a region for consistency, preventing overselling. Order processing writes to the local region's primary database, which replicates to other regions for disaster recovery:
class EcommerceReplication
def initialize
@product_catalog = AsyncMultiRegionReplication.new
@inventory = SyncRegionalReplication.new
@orders = LocalPrimaryReplication.new(cross_region_backup: true)
end
def create_order(user_id, items)
# Check inventory with local strong consistency
@inventory.transaction do
items.each do |item|
available = @inventory.read(item[:sku], consistency: :strong)
raise OutOfStock unless available >= item[:quantity]
@inventory.decrement(item[:sku], item[:quantity])
end
# Record order in local region
order = @orders.write(
user_id: user_id,
items: items,
timestamp: Time.now
)
# Asynchronously replicate to other regions
@orders.replicate_async(order, regions: :all)
order
end
end
def get_product(sku)
# Read from local replica for low latency
@product_catalog.read(
sku,
region: current_region,
max_staleness: 5.minutes
)
end
end
Social Media Applications use eventual consistency for most data, accepting temporary inconsistencies in exchange for performance and availability. Posts replicate asynchronously across data centers. Like counts use CRDTs to merge concurrent updates. User authentication requires strong consistency for security:
class SocialMediaReplication
def create_post(user_id, content)
# Write to local data center
post = @posts_db.insert(
user_id: user_id,
content: content,
timestamp: Time.now.to_f
)
# Replicate to other data centers async
propagate_to_regions(post)
# Update follower feeds asynchronously
enqueue_feed_fanout(user_id, post[:id])
post
end
def increment_likes(post_id, user_id)
# CRDT counter for conflict-free merge
@like_counter.increment(post_id, replica_id: current_datacenter)
# Propagate to other data centers
propagate_counter_state(post_id)
end
def authenticate(username, password)
# Read from master for strong consistency
user = @auth_db.find(
username,
consistency: :strong,
encryption: true
)
verify_password(user, password)
end
end
Financial Systems prioritize consistency and durability over performance. Transaction processing uses synchronous replication to multiple nodes before acknowledging, ensuring transactions survive failures. Audit logs replicate synchronously and immutably. Reporting databases receive asynchronous replication for read-only analytics:
class FinancialTransactionReplication
def process_transaction(from_account, to_account, amount)
# Synchronous replication to 3 nodes with durable storage
@transaction_db.transaction(
write_concern: { w: 3, j: true, wtimeout: 5000 }
) do
debit = debit_account(from_account, amount)
credit = credit_account(to_account, amount)
audit_log = log_transaction(
type: :transfer,
from: from_account,
to: to_account,
amount: amount,
timestamp: Time.now
)
# Synchronously replicate audit log
@audit_log.write(
audit_log,
replicas: :all_sync,
durable: true
)
# Asynchronously replicate to analytics DB
@analytics_replication.queue_write(audit_log)
end
end
def query_balance(account)
# Always read from primary for accurate balance
@transaction_db.find(
account,
consistency: :strong,
isolation_level: :serializable
)
end
end
Content Delivery Networks use hierarchical replication to distribute content globally. Origin servers hold the canonical content. Regional edge servers replicate frequently accessed content. Edge servers use TTL-based caching with on-demand fetching for cache misses:
class CDNReplication
def initialize
@origin = OriginServer.new
@edge_servers = EdgeServerPool.new
@replication_policy = ReplicationPolicy.new
end
def get_content(path, edge_location)
edge_server = @edge_servers.closest_to(edge_location)
# Try edge server first
content = edge_server.read(path)
if content && !content.expired?
return content
end
# Cache miss or expired - fetch from origin
content = @origin.read(path)
# Populate edge cache
edge_server.write(
path,
content,
ttl: @replication_policy.ttl_for(path)
)
# Prefetch to nearby edges if popular
if @replication_policy.should_prefetch?(path)
prefetch_to_region(path, edge_location)
end
content
end
private
def prefetch_to_region(path, location)
nearby_edges = @edge_servers.in_region(location)
Thread.new do
content = @origin.read(path)
nearby_edges.each do |edge|
edge.write(path, content, ttl: 1.hour)
end
end
end
end
Time-Series Databases for monitoring and metrics use partitioned replication. Recent data replicates to all nodes for real-time alerting. Historical data replicates to fewer nodes for archival. Downsampling reduces resolution of old data to save storage:
class TimeSeriesReplication
def write_metric(metric_name, value, timestamp = Time.now)
partition = time_partition(timestamp)
case partition
when :recent # Last 24 hours
@timeseries_db.write(
metric_name,
value,
timestamp,
replication: :all_nodes,
resolution: :full
)
when :daily # Last 30 days
@timeseries_db.write(
metric_name,
downsample(value),
timestamp,
replication: :some_nodes,
resolution: :minute
)
when :archive # Older than 30 days
@timeseries_db.write(
metric_name,
downsample(value, factor: 60),
timestamp,
replication: :archive_nodes,
resolution: :hour
)
end
end
private
def time_partition(timestamp)
age = Time.now - timestamp
if age < 24.hours
:recent
elsif age < 30.days
:daily
else
:archive
end
end
end
Reference
Replication Strategy Comparison
| Strategy | Write Latency | Read Scalability | Consistency | Complexity |
|---|---|---|---|---|
| Master-Slave Async | Low | High | Eventual | Low |
| Master-Slave Sync | High | High | Strong | Medium |
| Multi-Master | Low | High | Eventual | High |
| Quorum-Based | Medium | Medium | Tunable | Medium |
| Chain Replication | High | High | Strong | Medium |
| Peer-to-Peer | Low | Very High | Eventual | High |
Consistency Models
| Model | Guarantees | Use Cases | Trade-offs |
|---|---|---|---|
| Strong | All reads see latest write | Financial transactions, inventory | High latency, reduced availability |
| Eventual | Replicas converge over time | Social feeds, catalogs | Temporary inconsistency |
| Causal | Preserves cause-effect order | Collaborative editing | Complex implementation |
| Read-Your-Writes | Readers see their own writes | User profiles, settings | Session stickiness required |
| Monotonic Reads | Reads never go backward in time | Analytics, monitoring | May see stale data |
Quorum Configurations
| Configuration | Read Performance | Write Performance | Consistency |
|---|---|---|---|
| W=N, R=1 | Fastest reads | Slowest writes | Strong consistency |
| W=1, R=N | Slowest reads | Fastest writes | Strong consistency |
| W=majority, R=majority | Balanced | Balanced | Strong consistency |
| W=1, R=1 | Fastest reads | Fastest writes | Eventual consistency |
Conflict Resolution Strategies
| Strategy | Mechanism | Convergence | Data Loss Risk |
|---|---|---|---|
| Last-Write-Wins | Timestamp comparison | Guaranteed | Updates may be lost |
| Vector Clocks | Causal ordering | Guaranteed | Requires manual merge |
| CRDTs | Commutative operations | Guaranteed | None |
| Application Merge | Custom logic | Depends | Depends on logic |
Replication Lag Metrics
| Metric | Description | Healthy Range | Action Threshold |
|---|---|---|---|
| Bytes Behind | Data volume not yet replicated | < 1 MB | > 10 MB |
| Time Behind | Estimated time lag | < 1 second | > 10 seconds |
| Transactions Behind | Number of operations pending | < 100 | > 1000 |
| Apply Rate | Replication speed | Near write rate | < 50% of write rate |
ActiveRecord Connection Roles
| Role | Purpose | Connection Target | Usage |
|---|---|---|---|
| writing | Write operations | Primary database | Create, update, delete |
| reading | Read operations | Replica database | Select queries |
| replica | Explicit replica reads | Named replica | Specific replica selection |
Redis Replication Commands
| Command | Purpose | Example |
|---|---|---|
| REPLICAOF | Configure replication | REPLICAOF hostname port |
| INFO replication | Check replication status | INFO replication |
| ROLE | Get node role | ROLE |
| WAIT | Wait for replication | WAIT numreplicas timeout |
MongoDB Write Concerns
| Write Concern | Acknowledgment | Durability | Use Case |
|---|---|---|---|
| w: 1 | Single node | Low | High throughput |
| w: majority | Quorum | High | Consistency |
| j: true | Journal sync | Highest | Critical data |
| wtimeout: 5000 | Timeout 5s | Varies | Prevent blocking |
Performance Optimization Techniques
| Technique | Benefit | Implementation | Trade-off |
|---|---|---|---|
| Connection Pooling | Reduce connection overhead | Maintain open connections | Memory usage |
| Batch Replication | Lower network overhead | Buffer then send | Increased lag |
| Compression | Reduce bandwidth | Compress replication stream | CPU usage |
| Parallel Apply | Faster replication | Multi-threaded apply | Complexity |
| Read-Local | Lower read latency | Route to nearest replica | Staleness |