Overview
The CAP theorem, also known as Brewer's theorem, states that a distributed data store cannot simultaneously provide more than two out of three guarantees: Consistency, Availability, and Partition tolerance. Eric Brewer introduced this concept in 2000, and Seth Gilbert and Nancy Lynch later proved it formally in 2002.
The theorem addresses fundamental constraints in distributed systems where data is replicated across multiple nodes. When network failures occur (partitions), the system must choose between maintaining consistency across all nodes or remaining available for read and write operations. This choice has significant implications for system design and behavior.
The three properties are:
Consistency: Every read receives the most recent write or an error. All nodes see the same data at the same time.
Availability: Every request receives a non-error response, without guarantee that it contains the most recent write. The system remains operational for reads and writes.
Partition Tolerance: The system continues to operate despite network partitions that prevent communication between nodes.
In practice, partition tolerance is not optional for distributed systems operating across networks. Network failures occur in real-world deployments, making the practical choice between consistency (CP systems) and availability (AP systems) when partitions happen.
# Conceptual representation of CAP trade-offs
class DistributedSystem
def initialize(mode)
@mode = mode # :cp or :ap
end
def handle_partition
case @mode
when :cp
# Reject writes to maintain consistency
raise PartitionError, "System unavailable during partition"
when :ap
# Accept writes, allow temporary inconsistency
accept_write_with_eventual_consistency
end
end
end
Key Principles
The CAP theorem rests on specific definitions of its three properties that developers must understand to apply the theorem correctly.
Consistency in CAP Context
Consistency in the CAP theorem refers to linearizability, a specific consistency model stronger than eventual consistency or other weaker models. Linearizability requires that operations appear to occur instantaneously at some point between their invocation and completion. Once a write completes, all subsequent reads must return that value or a newer one. This differs from ACID consistency in databases, which concerns maintaining invariants and constraints within a single database.
Consider a distributed counter across two nodes. With linearizability, if Node A increments the counter from 5 to 6, and this operation completes, then any read from any node must return 6 or higher. No node can return 5 after the increment completes, even if network delays occur.
Availability in CAP Context
Availability means every request to a non-failing node must result in a response. The theorem defines this strictly: the system cannot refuse requests or delay indefinitely. A system that responds with stale data remains available, while a system that refuses requests to maintain consistency is not available in CAP terms.
This definition differs from common operational availability metrics like "five nines" (99.999% uptime). CAP availability is binary per request: either the system responds or it does not. Operational availability measures the percentage of time the system responds.
Partition Tolerance in CAP Context
Partition tolerance means the system continues operating when network partitions prevent some nodes from communicating. A partition splits the network into subgroups where nodes within a subgroup can communicate, but nodes in different subgroups cannot. The system must handle these scenarios because networks are unreliable.
Partition tolerance is mandatory for distributed systems. The meaningful choice is between consistency and availability when partitions occur. Systems cannot avoid partitions in environments spanning multiple data centers, cloud availability zones, or unreliable networks.
The Two-Out-Of-Three Choice
When a partition occurs, the system faces a binary decision:
CP systems sacrifice availability to maintain consistency. During a partition, nodes that cannot communicate with others refuse requests rather than risk serving stale data. This ensures all successful reads return the most recent write.
AP systems sacrifice consistency to maintain availability. During a partition, all nodes continue serving requests, accepting that different nodes may have different data temporarily. The system achieves eventual consistency after the partition heals.
Network Partitions Are Inevitable
Real-world networks experience various failure modes: broken cables, failed switches, routing issues, firewall misconfigurations, and cloud provider outages. These failures create partitions where some nodes become unreachable. Systems must account for these scenarios rather than assume perfect networks.
The duration of partitions varies from milliseconds to hours. Short partitions may go unnoticed by users, while extended partitions require explicit handling. The CAP theorem forces architects to decide how their system behaves during these inevitable events.
Design Considerations
Selecting between CP and AP models requires analyzing application requirements, user expectations, and business constraints. The choice affects system behavior, implementation complexity, and operational characteristics.
When to Choose CP (Consistency over Availability)
Financial systems, inventory management, and systems maintaining critical invariants typically require CP characteristics. An ATM that dispenses cash must ensure the account balance is accurate before completing the transaction. Displaying stale balance data and allowing withdrawals could enable overdrafts or duplicate spending.
Configuration management systems often choose consistency. If an application reads configuration determining access control rules, stale configuration could grant unauthorized access. Refusing requests during partitions provides better security than serving stale, potentially incorrect data.
Distributed locking and leader election require consistency. Multiple nodes cannot believe they hold the same lock simultaneously. These coordination primitives sacrifice availability to guarantee correctness.
# CP system example: Strong consistency requirement
class BankAccount
def initialize(balance, consensus_system)
@balance = balance
@consensus = consensus_system
end
def withdraw(amount)
# Must achieve consensus before proceeding
lock = @consensus.acquire_lock(account_id)
unless lock
raise UnavailableError, "Cannot acquire lock during partition"
end
current_balance = @consensus.read_committed_value(:balance)
if current_balance >= amount
new_balance = current_balance - amount
@consensus.write_committed_value(:balance, new_balance)
lock.release
new_balance
else
lock.release
raise InsufficientFundsError
end
end
end
When to Choose AP (Availability over Consistency)
Social media feeds, caching layers, and analytics systems often prioritize availability. Users accept temporary inconsistencies in exchange for uninterrupted service. A user seeing a slightly outdated follower count causes minimal harm compared to the entire service becoming unavailable.
Session storage and shopping carts benefit from availability. A user adding items to a cart expects the operation to succeed immediately. If different data center nodes have slightly different cart contents temporarily, the system can reconcile differences when the user checks out.
Content delivery networks and DNS prioritize availability. Users receiving slightly stale cached content is preferable to failed requests. These systems use short TTLs and eventual consistency to converge on current data.
# AP system example: Prioritizing availability
class ShoppingCart
def initialize(user_id, replica_set)
@user_id = user_id
@replicas = replica_set
end
def add_item(item_id, quantity)
# Write to any available replica
available_replica = @replicas.find { |r| r.healthy? }
unless available_replica
# Fall back to local cache if all replicas unavailable
return add_to_local_cache(item_id, quantity)
end
# Write succeeds even if other replicas unreachable
available_replica.write(
user_id: @user_id,
item_id: item_id,
quantity: quantity,
timestamp: Time.now.to_f
)
# Asynchronously propagate to other replicas
@replicas.each do |replica|
replica.async_replicate(user_id: @user_id) if replica != available_replica
end
true
end
def get_items
# Return merged results from all available replicas
items_by_replica = @replicas.select(&:healthy?).map(&:read)
merge_with_conflict_resolution(items_by_replica)
end
end
Hybrid Approaches and Tunable Consistency
Some systems allow tuning consistency versus availability per operation. Databases like Cassandra provide consistency levels ranging from ONE (high availability) to ALL (strong consistency). Applications can choose appropriate levels for each operation type.
Read-modify-write operations often require strong consistency, while simple reads may accept eventual consistency. A social media application might require strong consistency when creating posts but accept eventual consistency when loading feeds.
# Tunable consistency example
class TunableDataStore
def write(key, value, consistency_level: :quorum)
case consistency_level
when :one
# Write to one node, return immediately
@replicas.first.write(key, value)
when :quorum
# Write to majority before returning
required = (@replicas.size / 2) + 1
write_to_n_replicas(key, value, required)
when :all
# Write to all nodes before returning
write_to_all_replicas(key, value)
end
end
def read(key, consistency_level: :quorum)
case consistency_level
when :one
# Read from first available node
@replicas.find(&:healthy?).read(key)
when :quorum
# Read from majority, return most recent
required = (@replicas.size / 2) + 1
values = read_from_n_replicas(key, required)
resolve_conflicts(values)
when :all
# Read from all nodes, return most recent
values = read_from_all_replicas(key)
resolve_conflicts(values)
end
end
end
Operational Trade-offs
CP systems require more operational complexity during failures. Administrators must monitor partition detection, understand when the system becomes unavailable, and have procedures for recovering from split-brain scenarios where different subsets of nodes cannot communicate.
AP systems require conflict resolution mechanisms. When partitions heal, the system must merge divergent data states. This requires application-level logic to resolve conflicts, tombstones for deletions, and vector clocks or similar mechanisms to track causality.
Implementation Approaches
Different architectural patterns and algorithms implement CAP trade-offs. The choice affects performance, complexity, and failure handling characteristics.
Consensus Protocols for CP Systems
Consensus protocols like Raft and Paxos enable CP systems by ensuring nodes agree on state changes before committing them. These protocols elect a leader that coordinates writes. Followers reject writes during partitions when they cannot reach the leader, sacrificing availability for consistency.
Raft divides time into terms, each with at most one leader. Candidates become leaders by receiving votes from a majority of nodes. Write requests go to the leader, which replicates them to followers. Once a majority acknowledges the write, the leader commits it. This ensures consistency because any future leader must have all committed writes.
# Simplified Raft-inspired consensus
class ConsensusNode
attr_reader :state, :current_term, :voted_for
def initialize(node_id, cluster_members)
@node_id = node_id
@cluster = cluster_members
@state = :follower
@current_term = 0
@voted_for = nil
@log = []
@commit_index = 0
end
def request_vote(term, candidate_id, last_log_index, last_log_term)
# Reject if candidate's term is older
return false if term < @current_term
# Update term if candidate's term is newer
if term > @current_term
@current_term = term
@voted_for = nil
@state = :follower
end
# Grant vote if haven't voted and candidate's log is up-to-date
if @voted_for.nil? && log_is_current?(last_log_index, last_log_term)
@voted_for = candidate_id
return true
end
false
end
def append_entries(term, leader_id, prev_log_index, prev_log_term, entries, leader_commit)
# Reject if leader's term is older
return false if term < @current_term
# Accept leader and update term
@current_term = term
@state = :follower
# Verify log consistency
unless log_matches?(prev_log_index, prev_log_term)
return false
end
# Append new entries
append_to_log(entries, prev_log_index)
# Update commit index
if leader_commit > @commit_index
@commit_index = [leader_commit, @log.size - 1].min
end
true
end
def client_write(data)
unless @state == :leader
raise NotLeaderError, "Forward to current leader"
end
# Append to log
entry = { term: @current_term, data: data }
@log << entry
# Replicate to majority
replicas_acked = replicate_to_followers(entry)
if replicas_acked >= majority_size
@commit_index = @log.size - 1
apply_to_state_machine(entry)
return true
else
# Could not reach majority - system unavailable
raise UnavailableError, "Cannot achieve consensus"
end
end
private
def majority_size
(@cluster.size / 2) + 1
end
end
Last Write Wins for AP Systems
AP systems using eventual consistency often employ Last Write Wins (LWW) conflict resolution. Each write includes a timestamp. When conflicts arise, the write with the latest timestamp wins. This strategy is simple but can lose concurrent updates.
LWW works well for operations that naturally overwrite previous values, such as updating user profiles or configuration settings. It performs poorly for operations that should accumulate, like incrementing counters or adding items to sets.
CRDTs for AP Systems
Conflict-free Replicated Data Types (CRDTs) enable AP systems to merge concurrent updates without conflicts. CRDTs define data structures with merge operations that are commutative, associative, and idempotent. Replicas can apply updates in any order and converge to the same state.
G-Counter (Grow-only Counter) CRDTs track increments across replicas. Each replica maintains a vector of counts, one per replica. The total count is the sum of all vector elements. When replicas merge, they take the maximum count for each position.
# CRDT G-Counter implementation
class GCounter
def initialize(replica_id, num_replicas)
@replica_id = replica_id
@counts = Array.new(num_replicas, 0)
end
def increment(amount = 1)
@counts[@replica_id] += amount
end
def value
@counts.sum
end
def merge(other_counter)
@counts.each_with_index do |count, i|
@counts[i] = [count, other_counter.counts[i]].max
end
end
protected
attr_reader :counts
end
# Usage across partitions
counter_a = GCounter.new(0, 3)
counter_b = GCounter.new(1, 3)
counter_a.increment(5)
counter_b.increment(3)
# After partition heals, merge states
counter_a.merge(counter_b)
counter_a.value # => 8, no data lost
Quorum-Based Replication
Quorum systems balance consistency and availability by requiring acknowledgment from multiple replicas. A write quorum of W replicas must acknowledge writes. A read quorum of R replicas must respond to reads. Setting W + R > N (total replicas) ensures reads always see the latest write.
With N=5, W=3, R=3, the system tolerates two replica failures. Writes require three replicas, and reads query three replicas, guaranteeing overlap with the most recent write. Setting W=3, R=1 prioritizes read availability over read consistency.
# Quorum-based storage
class QuorumStore
def initialize(replicas, write_quorum, read_quorum)
@replicas = replicas
@write_quorum = write_quorum
@read_quorum = read_quorum
end
def write(key, value)
version = generate_version
successful_writes = 0
@replicas.each do |replica|
begin
replica.write(key, value, version)
successful_writes += 1
break if successful_writes >= @write_quorum
rescue ReplicaError
next
end
end
if successful_writes >= @write_quorum
true
else
raise InsufficientReplicasError, "Could not reach write quorum"
end
end
def read(key)
responses = []
@replicas.each do |replica|
begin
responses << replica.read(key)
break if responses.size >= @read_quorum
rescue ReplicaError
next
end
end
if responses.size >= @read_quorum
# Return value with highest version
responses.max_by { |r| r[:version] }[:value]
else
raise InsufficientReplicasError, "Could not reach read quorum"
end
end
end
Compensating Transactions
Some systems use compensating transactions to provide eventual consistency while appearing available. The system accepts operations immediately and executes them asynchronously. If conflicts or failures occur, the system reverses the operation with a compensating transaction.
Payment systems may immediately confirm a purchase, then asynchronously verify the charge with the payment processor. If verification fails, the system cancels the order and notifies the customer. This approach maximizes availability while maintaining consistency through compensation.
Practical Examples
Concrete scenarios demonstrate how CAP trade-offs manifest in real systems and guide implementation decisions.
Distributed Counter Service
A web analytics service tracks page view counts across multiple data centers. The system must decide how to handle network partitions between data centers.
CP approach: The system designates one data center as the leader. All increments route to the leader, which replicates to followers. During a partition isolating the leader, the system rejects increments from isolated data centers. This guarantees accurate counts but makes the service unavailable in some regions during partitions.
class CPAnalyticsCounter
def initialize(node_id, is_leader, peer_nodes)
@node_id = node_id
@is_leader = is_leader
@peers = peer_nodes
@counts = {}
end
def increment(page_id)
unless @is_leader
# Forward to leader
leader = @peers.find(&:leader?)
unless leader&.reachable?
raise UnavailableError, "Cannot reach leader"
end
return leader.increment(page_id)
end
# Leader increments locally
@counts[page_id] ||= 0
@counts[page_id] += 1
# Replicate to majority of followers
acks = replicate_to_followers(page_id, @counts[page_id])
unless acks >= majority_required
# Rollback and reject
@counts[page_id] -= 1
raise UnavailableError, "Cannot reach quorum"
end
@counts[page_id]
end
def get_count(page_id)
@counts[page_id] || 0
end
end
AP approach: Each data center maintains its own counter. Increments succeed locally without coordination. The system periodically exchanges counts between data centers and merges them. During partitions, each data center continues accepting increments. Counts may temporarily diverge but eventually converge.
class APAnalyticsCounter
def initialize(node_id, peer_nodes)
@node_id = node_id
@peers = peer_nodes
@local_counts = {} # This node's increments
@merged_counts = {} # Merged view from all nodes
end
def increment(page_id)
# Always succeeds locally
@local_counts[page_id] ||= 0
@local_counts[page_id] += 1
# Asynchronously propagate to peers
@peers.each do |peer|
peer.async_notify_increment(page_id, @node_id) if peer.reachable?
end
# Return local view (may be incomplete)
get_count(page_id)
end
def get_count(page_id)
local = @local_counts[page_id] || 0
remote = @merged_counts[page_id] || 0
local + remote
end
def merge_remote_counts(node_id, counts)
counts.each do |page_id, count|
@merged_counts[page_id] ||= 0
@merged_counts[page_id] += count
end
end
def sync_with_peers
# Periodic full sync to handle missed increments
@peers.each do |peer|
next unless peer.reachable?
peer_counts = peer.get_local_counts
merge_remote_counts(peer.node_id, peer_counts)
end
end
end
Shopping Cart in Multi-Region Deployment
An e-commerce platform deploys across three geographic regions: US, EU, and ASIA. Users add items to carts, which must remain available during network issues between regions.
The system chooses an AP approach because cart operations should never fail, and temporary inconsistencies are acceptable. Each region maintains cart state locally. When users add items, the operation succeeds immediately in their region. The system asynchronously replicates cart changes to other regions.
During a partition between US and EU regions, a user's session may migrate from US to EU due to CDN routing. The EU region may not have the latest cart state. The system merges cart contents from both regions using timestamps and conflict resolution rules:
class DistributedCart
def initialize(user_id, region_id, peer_regions)
@user_id = user_id
@region_id = region_id
@peers = peer_regions
@items = {} # item_id => { quantity, timestamp, region_id }
end
def add_item(item_id, quantity)
timestamp = Time.now.to_f
@items[item_id] = {
quantity: quantity,
timestamp: timestamp,
region_id: @region_id
}
# Replicate asynchronously
@peers.each do |peer|
peer.async_replicate_cart_item(@user_id, item_id, quantity, timestamp, @region_id)
end
true # Always succeeds
end
def remove_item(item_id)
timestamp = Time.now.to_f
# Mark as tombstone rather than deleting
@items[item_id] = {
quantity: 0,
timestamp: timestamp,
region_id: @region_id,
deleted: true
}
# Replicate deletion
@peers.each do |peer|
peer.async_replicate_cart_removal(@user_id, item_id, timestamp, @region_id)
end
true
end
def merge_remote_update(item_id, quantity, timestamp, source_region_id, deleted: false)
existing = @items[item_id]
# No existing entry - accept remote update
unless existing
@items[item_id] = {
quantity: quantity,
timestamp: timestamp,
region_id: source_region_id,
deleted: deleted
}
return
end
# Compare timestamps for conflict resolution
if timestamp > existing[:timestamp]
# Remote update is newer
@items[item_id] = {
quantity: quantity,
timestamp: timestamp,
region_id: source_region_id,
deleted: deleted
}
elsif timestamp == existing[:timestamp]
# Concurrent updates - use region ID as tiebreaker
if source_region_id < existing[:region_id]
@items[item_id] = {
quantity: quantity,
timestamp: timestamp,
region_id: source_region_id,
deleted: deleted
}
end
# Otherwise keep existing
end
# Older updates are ignored
end
def get_items
@items.reject { |_, item| item[:deleted] }
.transform_values { |item| item[:quantity] }
end
end
Session Management Across Data Centers
A web application maintains user sessions across multiple data centers. Sessions store authentication tokens, user preferences, and temporary state. The system must decide how to handle sessions during data center partitions.
CP approach creates problems: if the session data center becomes unreachable, users cannot authenticate until connectivity restores. This makes the application effectively unavailable.
AP approach with sticky sessions: users consistently route to one data center based on geographic proximity. Sessions replicate asynchronously to other data centers. During partitions, users continue accessing their local data center with current session state. If a data center fails entirely, users fail over to another data center with slightly stale session state, requiring re-authentication at worst.
class ReplicatedSession
def initialize(session_id, primary_dc, replica_dcs)
@session_id = session_id
@primary_dc = primary_dc
@replicas = replica_dcs
@data = {}
@version = 0
end
def set(key, value)
@version += 1
@data[key] = { value: value, version: @version }
# Write to primary (local) immediately
@primary_dc.write_session(@session_id, key, value, @version)
# Async replicate to backups
@replicas.each do |replica|
replica.async_replicate_session(@session_id, key, value, @version)
end
true
end
def get(key)
# Read from local store
item = @data[key]
item ? item[:value] : nil
end
def sync_from_replica(replica_data)
# Merge data from replica using version numbers
replica_data.each do |key, item|
local_item = @data[key]
if !local_item || item[:version] > local_item[:version]
@data[key] = item
@version = [@version, item[:version]].max
end
end
end
end
Inventory Management System
An inventory system tracks product quantities across warehouses. The business requirement is that products cannot oversell (go negative). This necessitates a CP approach for decrement operations.
class CPInventorySystem
def initialize(product_id, warehouses, consensus_service)
@product_id = product_id
@warehouses = warehouses
@consensus = consensus_service
end
def reserve_stock(quantity, warehouse_id)
# Must acquire distributed lock
lock = @consensus.acquire_lock("inventory:#{@product_id}")
unless lock
raise UnavailableError, "Cannot acquire lock during partition"
end
begin
# Read current inventory with strong consistency
current_stock = @consensus.read_committed(
"inventory:#{@product_id}:#{warehouse_id}"
)
if current_stock >= quantity
new_stock = current_stock - quantity
@consensus.write_committed(
"inventory:#{@product_id}:#{warehouse_id}",
new_stock
)
return { success: true, remaining: new_stock }
else
return { success: false, error: "Insufficient stock" }
end
ensure
lock.release
end
end
def check_stock(warehouse_id)
# Reads can be eventually consistent
@warehouses[warehouse_id].read_local("inventory:#{@product_id}")
end
end
Ruby Implementation
Ruby applications interact with distributed systems implementing various CAP trade-offs. Ruby provides libraries and patterns for building both CP and AP systems.
Using Redis for AP Caching
Redis operates as an AP system when deployed in master-replica configuration without wait commands. Writes succeed on the master without waiting for replica acknowledgment. During partitions, stale reads may occur from replicas.
require 'redis'
class APCacheService
def initialize
@master = Redis.new(url: ENV['REDIS_MASTER_URL'])
@replicas = [
Redis.new(url: ENV['REDIS_REPLICA_1_URL']),
Redis.new(url: ENV['REDIS_REPLICA_2_URL'])
]
end
def set(key, value, ttl: 3600)
# Write to master, don't wait for replication
@master.setex(key, ttl, value)
true
rescue Redis::CannotConnectError
# Master unavailable - system becomes unavailable for writes
false
end
def get(key)
# Try master first
begin
return @master.get(key)
rescue Redis::CannotConnectError
# Fall back to replicas
end
# Try replicas in order
@replicas.each do |replica|
begin
return replica.get(key)
rescue Redis::CannotConnectError
next
end
end
# All nodes unavailable
nil
end
def get_with_health_check(key)
# Select healthy replica
available_nodes = [@master] + @replicas.select do |r|
r.ping == 'PONG' rescue false
end
return nil if available_nodes.empty?
# Read from random healthy node
available_nodes.sample.get(key)
end
end
Using etcd for CP Configuration
etcd provides strong consistency using the Raft consensus algorithm. Ruby applications use etcd for configuration, service discovery, and distributed locks.
require 'etcdv3'
class CPConfigurationService
def initialize
@etcd = Etcdv3.new(
endpoints: ENV['ETCD_ENDPOINTS'].split(','),
command_timeout: 5
)
end
def get_config(key)
response = @etcd.get(key)
response.kvs.first&.value
rescue GRPC::Unavailable, GRPC::DeadlineExceeded
# etcd cluster unavailable or can't reach quorum
raise ConfigUnavailableError, "Cannot reach etcd cluster"
end
def set_config(key, value)
@etcd.put(key, value)
true
rescue GRPC::Unavailable
# Cannot reach quorum - refuse operation
raise ConfigUnavailableError, "Cannot reach etcd quorum"
end
def acquire_lock(key, ttl: 30)
lease = @etcd.lease_grant(ttl)
# Attempt to acquire lock using compare-and-swap
success = @etcd.transaction do |txn|
txn.compare = [
txn.create_revision(key) == 0 # Key doesn't exist
]
txn.success = [
txn.put(key, Process.pid.to_s, lease: lease.id)
]
txn.failure = []
end
if success.succeeded
Lock.new(@etcd, key, lease)
else
@etcd.lease_revoke(lease.id)
nil
end
rescue GRPC::Unavailable
raise LockUnavailableError, "Cannot acquire lock during partition"
end
end
class Lock
def initialize(etcd, key, lease)
@etcd = etcd
@key = key
@lease = lease
end
def release
@etcd.lease_revoke(@lease.id)
end
def keep_alive
@etcd.lease_keep_alive_once(@lease.id)
end
end
Building a CRDT in Ruby
Ruby applications can implement CRDTs for AP characteristics with automatic conflict resolution.
# PN-Counter (Positive-Negative Counter) CRDT
class PNCounter
def initialize(replica_id, num_replicas)
@replica_id = replica_id
@positive = Array.new(num_replicas, 0)
@negative = Array.new(num_replicas, 0)
end
def increment(amount = 1)
@positive[@replica_id] += amount
end
def decrement(amount = 1)
@negative[@replica_id] += amount
end
def value
@positive.sum - @negative.sum
end
def merge(other)
@positive.each_with_index do |count, i|
@positive[i] = [count, other.positive[i]].max
end
@negative.each_with_index do |count, i|
@negative[i] = [count, other.negative[i]].max
end
self
end
def to_h
{
replica_id: @replica_id,
positive: @positive,
negative: @negative
}
end
def self.from_h(hash)
counter = allocate
counter.instance_variable_set(:@replica_id, hash[:replica_id])
counter.instance_variable_set(:@positive, hash[:positive])
counter.instance_variable_set(:@negative, hash[:negative])
counter
end
protected
attr_reader :positive, :negative
end
# Usage in distributed system
replica_1 = PNCounter.new(0, 3)
replica_2 = PNCounter.new(1, 3)
# Concurrent operations during partition
replica_1.increment(10)
replica_1.decrement(3)
replica_2.increment(5)
replica_2.decrement(2)
# Merge after partition heals
replica_1.merge(replica_2)
replica_1.value # => 10
Implementing Circuit Breaker for Partition Handling
Circuit breakers detect partition-like failures and fail fast rather than timing out repeatedly.
class CircuitBreaker
STATES = [:closed, :open, :half_open].freeze
def initialize(failure_threshold: 5, timeout: 60, half_open_attempts: 3)
@failure_threshold = failure_threshold
@timeout = timeout
@half_open_attempts = half_open_attempts
@state = :closed
@failure_count = 0
@last_failure_time = nil
@half_open_successes = 0
end
def call
case @state
when :open
if time_to_retry?
attempt_reset
yield
else
raise CircuitOpenError, "Circuit breaker is open"
end
when :half_open
execute_half_open { yield }
when :closed
execute_closed { yield }
end
end
private
def execute_closed
yield.tap { on_success }
rescue => e
on_failure
raise e
end
def execute_half_open
yield.tap do
@half_open_successes += 1
if @half_open_successes >= @half_open_attempts
reset_circuit
end
end
rescue => e
trip_circuit
raise e
end
def on_success
@failure_count = 0
end
def on_failure
@failure_count += 1
@last_failure_time = Time.now
trip_circuit if @failure_count >= @failure_threshold
end
def trip_circuit
@state = :open
@last_failure_time = Time.now
end
def attempt_reset
@state = :half_open
@half_open_successes = 0
end
def reset_circuit
@state = :closed
@failure_count = 0
end
def time_to_retry?
@last_failure_time && (Time.now - @last_failure_time) >= @timeout
end
end
# Usage with distributed service
class DistributedService
def initialize
@circuit_breaker = CircuitBreaker.new(
failure_threshold: 3,
timeout: 30
)
end
def call_remote_service(data)
@circuit_breaker.call do
# Call that may fail during partitions
HTTParty.post('https://remote-service/api', body: data, timeout: 5)
end
rescue CircuitOpenError
# Circuit is open - fail fast with cached/default response
get_cached_response(data)
end
end
Real-World Applications
Production systems demonstrate CAP trade-offs in practice across industries and architectures.
Cassandra - Tunable AP Database
Apache Cassandra provides tunable consistency levels allowing per-operation CAP trade-offs. Applications using Cassandra through Ruby clients choose consistency based on operation criticality.
Social media timelines use eventual consistency (CL=ONE) for high availability. Users accept slight delays in seeing new posts. Account authentication uses quorum reads (CL=QUORUM) to prevent login with stale credentials. Financial transactions use strong consistency (CL=ALL) to prevent duplicate charges.
Cassandra's implementation uses consistent hashing for data distribution and vector clocks for conflict detection. During partitions, different consistency levels behave differently: ONE continues operating with potentially stale data, QUORUM refuses operations if the quorum becomes unreachable, ALL requires all replicas.
DynamoDB - AP with Conditional CP
Amazon DynamoDB defaults to eventual consistency for reads, providing high availability. Applications requiring stronger consistency use strongly consistent reads, trading some availability. DynamoDB's conditional writes provide CP characteristics for operations requiring atomicity.
E-commerce sites use DynamoDB for product catalogs with eventual consistency. Catalog reads remain available during network issues, accepting slightly stale data. Order placement uses conditional writes to prevent overselling, choosing consistency over availability for critical operations.
PostgreSQL with Replication - CP Database
PostgreSQL with synchronous replication provides CP characteristics. The primary database waits for replicas to acknowledge writes before committing. During partitions, if replicas become unreachable, writes fail to maintain consistency.
Financial services use PostgreSQL for transactional data requiring strong consistency. Account balances, transaction histories, and audit logs must remain consistent. The system rejects writes during partitions rather than risk data inconsistency.
Consul - CP Service Discovery
HashiCorp Consul provides service discovery and configuration with strong consistency using Raft consensus. Services register with Consul and query for other services. During partitions, services cannot register or update if isolated from the Consul cluster.
Microservice architectures accept reduced availability during partitions in exchange for consistent service routing. Stale service discovery could route traffic to failed instances or outdated endpoints, causing cascading failures.
Memcached Cluster - AP Cache
Memcached clusters provide no consistency guarantees across nodes. Each node operates independently. During partitions, clients may read stale data or fail to invalidate caches across all nodes. Applications accept this trade-off for cache availability.
Web applications use Memcached for session storage and page fragments. Stale cached data causes minor issues like showing outdated page views or requiring re-login. The alternative - unavailable cache during partitions - would degrade performance severely.
Reference
CAP Theorem Properties
| Property | Definition | System Behavior |
|---|---|---|
| Consistency | All nodes see the same data simultaneously | Reads return the most recent write or error |
| Availability | Every request receives a response | System accepts all requests to non-failing nodes |
| Partition Tolerance | System operates despite network failures | Continues functioning when nodes cannot communicate |
System Classifications
| Type | Guarantees | Trade-offs | Use Cases |
|---|---|---|---|
| CP | Consistency + Partition Tolerance | Refuses requests during partitions | Financial systems, inventory, configuration |
| AP | Availability + Partition Tolerance | Serves potentially stale data | Social media, caching, session storage |
| CA | Consistency + Availability | Cannot handle partitions | Single-node systems only |
Consistency Models
| Model | Definition | Latency | Use Case |
|---|---|---|---|
| Linearizability | Operations appear instantaneous | High | Coordination, locking |
| Sequential | Operations execute in some total order | Medium | General purpose |
| Causal | Causally related operations ordered | Medium | Collaborative editing |
| Eventual | All replicas converge eventually | Low | High availability systems |
| Read Your Writes | Clients see their own writes | Low | Session data |
Consensus Algorithms
| Algorithm | Leader Election | Partition Behavior | Complexity |
|---|---|---|---|
| Raft | Yes | Rejects writes without majority | Moderate |
| Paxos | Yes | Rejects writes without majority | High |
| ZAB | Yes | Rejects writes without quorum | Moderate |
| Viewstamped Replication | Yes | Rejects writes without quorum | Moderate |
Conflict Resolution Strategies
| Strategy | Description | Data Loss Risk | Complexity |
|---|---|---|---|
| Last Write Wins | Newest timestamp wins | Concurrent updates lost | Low |
| Vector Clocks | Track causality per replica | None if combined with merge | Medium |
| CRDTs | Mathematically convergent types | None by design | Medium |
| Application Logic | Custom merge rules | Depends on implementation | High |
Quorum Configuration Trade-offs
| Configuration | R | W | N | Read Latency | Write Latency | Consistency |
|---|---|---|---|---|---|---|
| Strong | 3 | 3 | 5 | Medium | Medium | Strong |
| Balanced | 2 | 3 | 5 | Low | Medium | Eventually strong |
| Read Optimized | 1 | 5 | 5 | Low | High | Eventually strong |
| Write Optimized | 5 | 1 | 5 | High | Low | Weak |
Ruby Libraries for Distributed Systems
| Library | CAP Profile | Use Case | Key Features |
|---|---|---|---|
| redis-rb | AP | Caching, sessions | Master-replica replication |
| etcdv3 | CP | Configuration, locking | Raft consensus |
| cassandra-driver | Tunable | Distributed database | Per-operation consistency levels |
| bunny | Depends on RabbitMQ | Message queuing | Clustering with mirrored queues |
| connection_pool | N/A | Connection management | Handles failover logic |
Operational Monitoring Metrics
| Metric | Purpose | Alert Threshold |
|---|---|---|
| Partition Detection Rate | Track network failures | Baseline + 2 std dev |
| Quorum Achievement Rate | Monitor consensus health | Below 99.9% |
| Replication Lag | Measure data freshness | Above 1 second |
| Failed Request Rate | Track availability | Above 0.1% |
| Conflict Resolution Rate | Monitor AP system health | Sudden increases |
| Leader Election Time | CP system recovery speed | Above 10 seconds |