Overview
Lock-free programming refers to concurrent programming techniques that allow multiple threads to access shared data structures without using traditional mutual exclusion locks. Instead of blocking threads when conflicts occur, lock-free algorithms use atomic operations and careful memory ordering to ensure correctness while allowing continuous progress.
The approach emerged from research into non-blocking synchronization in the 1980s and 1990s, addressing the fundamental limitations of lock-based concurrency: deadlock, priority inversion, convoying, and the inability to guarantee system-wide progress when a thread holding a lock fails or is delayed. Lock-free algorithms guarantee that at least one thread makes progress within a finite number of steps, regardless of the execution state of other threads.
Lock-free programming operates on hardware-level atomic instructions, particularly Compare-And-Swap (CAS), which atomically compares a memory location to an expected value and updates it to a new value only if it matches. Modern processors provide these primitives as single CPU instructions, making them the foundation for building higher-level concurrent data structures without locks.
# Traditional lock-based counter
class LockedCounter
def initialize
@count = 0
@mutex = Mutex.new
end
def increment
@mutex.synchronize { @count += 1 }
end
end
# Lock-free counter using atomic operations
require 'concurrent'
class LockFreeCounter
def initialize
@count = Concurrent::AtomicFixnum.new(0)
end
def increment
@count.increment
end
end
The distinction between lock-free, wait-free, and obstruction-free algorithms defines different progress guarantees. Lock-free guarantees system-wide progress but individual threads may starve. Wait-free guarantees every thread completes in bounded steps. Obstruction-free guarantees progress only when a thread runs in isolation. Most practical implementations target lock-free properties as the optimal balance between complexity and guarantees.
Key Principles
Lock-free programming rests on atomic read-modify-write operations provided by processor architectures. The Compare-And-Swap (CAS) operation forms the primary building block, atomically reading a memory location, comparing it to an expected value, and writing a new value only if the comparison succeeds. This atomic triplet executes without interruption, providing the linearization point for lock-free algorithms.
# Conceptual CAS operation
def compare_and_swap(location, expected, new_value)
atomically do
current = location.read
if current == expected
location.write(new_value)
return true
else
return false
end
end
end
Memory ordering defines the visibility and ordering of memory operations across threads. Sequential consistency provides the strongest guarantee where operations appear to execute in program order, but modern processors use weaker models for performance. Acquire-release semantics create synchronization points: acquire operations prevent later operations from moving before them, while release operations prevent earlier operations from moving after them. Lock-free algorithms must explicitly specify memory ordering to ensure correctness across architectures.
The ABA problem represents a fundamental challenge in lock-free programming. A thread reads value A from a location, another thread changes it to B and back to A, and the first thread's CAS succeeds despite intervening modifications. This false positive corrupts data structures that depend on the absence of modifications. Solutions include tagged pointers (combining value with version counter), hazard pointers (preventing premature reclamation), or garbage collection.
# ABA problem demonstration
class Stack
Node = Struct.new(:value, :next)
def initialize
@head = Concurrent::AtomicReference.new(nil)
end
def push(value)
loop do
old_head = @head.get
new_node = Node.new(value, old_head)
return if @head.compare_and_set(old_head, new_node)
end
end
def pop
loop do
old_head = @head.get
return nil unless old_head
next_node = old_head.next
# ABA problem: old_head might have been removed, reused, and
# replaced with different node at same memory location
return old_head.value if @head.compare_and_set(old_head, next_node)
end
end
end
Retry loops characterize lock-free algorithm structure. Operations attempt their work optimistically, assuming no conflicts. When CAS fails due to concurrent modifications, the operation restarts. The loop continues until success, with each iteration representing a potential linearization point. The algorithm must ensure that repeated failures don't prevent other threads from making progress, distinguishing lock-free from merely obstruction-free approaches.
Linearization points define the atomic instant where an operation takes effect. In lock-free structures, identifying linearization points proves more complex than in locked code where the lock acquisition serves as an obvious synchronization boundary. Each successful CAS represents a potential linearization point, but failed operations may linearize at different points depending on observed state. Correct linearization point identification enables reasoning about concurrent behavior and proving correctness.
Memory reclamation presents the central challenge in lock-free memory management. Threads cannot immediately free nodes accessed by other threads without risking use-after-free errors. Garbage-collected languages like Ruby simplify this problem, as the GC handles reclamation safely. Languages requiring manual memory management employ techniques like hazard pointers (threads announce pointers they're accessing), epoch-based reclamation (threads mark entry/exit from critical sections), or reference counting schemes.
False sharing degrades performance when multiple threads modify variables on the same cache line. Processors maintain cache coherence at cache line granularity (typically 64 bytes), so concurrent modifications to nearby memory locations cause cache line ping-ponging between cores even when modifying logically independent data. Lock-free structures must pad shared variables to separate cache lines, trading memory for performance.
Ruby Implementation
Ruby provides atomic operations through the concurrent-ruby gem, which wraps platform-specific atomic primitives. The gem offers atomic references, numeric types, and boolean values as building blocks for lock-free structures. Ruby's Global Interpreter Lock (GIL) in MRI limits true parallelism, but atomic operations remain valuable for coordinating with C extensions, ensuring memory visibility across threads, and preparing code for truly parallel Ruby implementations.
require 'concurrent'
# Atomic numeric operations
atomic_num = Concurrent::AtomicFixnum.new(0)
atomic_num.increment # => 1
atomic_num.decrement # => 0
atomic_num.compare_and_set(0, 5) # => true
atomic_num.update { |v| v * 2 } # Retries until successful
# Atomic reference for object storage
atomic_ref = Concurrent::AtomicReference.new(nil)
atomic_ref.set([1, 2, 3])
atomic_ref.compare_and_set([1, 2, 3], [4, 5, 6])
old_value = atomic_ref.get_and_set([7, 8, 9])
The AtomicReference class implements compare-and-swap for object references, enabling lock-free data structures. The compare_and_set method uses object identity comparison, checking if the current reference equals the expected reference using equal? semantics. This identity-based comparison avoids the ABA problem for objects since Ruby's garbage collector ensures that reclaimed objects don't reappear at the same memory address.
class LockFreeStack
Node = Struct.new(:value, :next)
def initialize
@head = Concurrent::AtomicReference.new(nil)
end
def push(value)
loop do
old_head = @head.get
new_node = Node.new(value, old_head)
break if @head.compare_and_set(old_head, new_node)
# CAS failed, another thread modified head, retry
end
end
def pop
loop do
old_head = @head.get
return nil unless old_head
next_node = old_head.next
# Successfully update head to next node
return old_head.value if @head.compare_and_set(old_head, next_node)
# CAS failed, retry with current head
end
end
def empty?
@head.get.nil?
end
end
The concurrent-ruby gem also provides higher-level abstractions built on atomic primitives. Concurrent::Array offers thread-safe array operations, Concurrent::Hash provides lock-free hash table access, and Concurrent::Map implements a lock-free concurrent map using atomic operations internally. These structures handle the complexity of lock-free implementation while exposing simple interfaces.
# High-level concurrent collections
concurrent_map = Concurrent::Map.new
concurrent_map.put_if_absent(:key, 'value')
concurrent_map.compute(:counter) { |old| (old || 0) + 1 }
# Atomic boolean for flags
flag = Concurrent::AtomicBoolean.new(false)
flag.make_true # Atomically set to true
flag.make_false # Atomically set to false
Ruby's thread-safe queue implementations in the Thread::Queue and Thread::SizedQueue classes use internal locking rather than lock-free algorithms, prioritizing simplicity and avoiding the GIL complications. For truly lock-free queues in performance-critical code, developers must implement custom structures using AtomicReference primitives or interface with native extensions that bypass the GIL.
The update method on atomic types implements the common retry pattern concisely. It accepts a block that transforms the current value, retrying the entire block if a concurrent modification occurs. This pattern handles the CAS retry loop internally, simplifying code that performs atomic updates.
# Atomic update with retry pattern
counter = Concurrent::AtomicFixnum.new(10)
# Manual CAS retry loop
loop do
current = counter.get
new_value = current * 2
break if counter.compare_and_set(current, new_value)
end
# Equivalent using update
counter.update { |v| v * 2 }
# More complex transformations
stats = Concurrent::AtomicReference.new({ count: 0, sum: 0 })
stats.update do |current|
{
count: current[:count] + 1,
sum: current[:sum] + value
}
end
JRuby and TruffleRuby provide better concurrent performance characteristics than MRI by eliminating the GIL, allowing true parallel execution of Ruby code. In these implementations, lock-free structures deliver their full performance benefits, with atomic operations executing as actual lock-free processor instructions rather than being serialized by a global lock.
Practical Examples
Lock-free counters represent the simplest lock-free data structure, requiring only atomic increment and decrement operations. Multiple threads can increment a counter simultaneously without coordination beyond the atomic operation itself. The structure demonstrates the basic retry pattern where failed operations immediately retry until success.
require 'concurrent'
class MetricsCollector
def initialize
@request_count = Concurrent::AtomicFixnum.new(0)
@error_count = Concurrent::AtomicFixnum.new(0)
@total_latency = Concurrent::AtomicFixnum.new(0)
end
def record_request(latency_ms, error: false)
@request_count.increment
@error_count.increment if error
@total_latency.update { |current| current + latency_ms }
end
def stats
count = @request_count.get
errors = @error_count.get
latency = @total_latency.get
{
requests: count,
errors: errors,
error_rate: count > 0 ? errors.to_f / count : 0.0,
avg_latency: count > 0 ? latency.to_f / count : 0.0
}
end
def reset
@request_count.set(0)
@error_count.set(0)
@total_latency.set(0)
end
end
# Usage across multiple threads
collector = MetricsCollector.new
threads = 10.times.map do
Thread.new do
1000.times do
latency = rand(10..500)
error = rand < 0.05
collector.record_request(latency, error: error)
end
end
end
threads.each(&:join)
puts collector.stats
# => {:requests=>10000, :errors=>~500, :error_rate=>~0.05, :avg_latency=>~255}
Lock-free stacks implement a LIFO data structure using a single atomic reference to the head node. Push operations create a new node pointing to the current head, then atomically swap it in. Pop operations atomically replace the head with its next pointer. Both operations retry on concurrent modifications until success.
class LockFreeStack
Node = Struct.new(:value, :next)
def initialize
@head = Concurrent::AtomicReference.new(nil)
@size = Concurrent::AtomicFixnum.new(0)
end
def push(value)
new_node = Node.new(value, nil)
loop do
old_head = @head.get
new_node.next = old_head
if @head.compare_and_set(old_head, new_node)
@size.increment
return value
end
# Retry if another thread modified head
end
end
def pop
loop do
old_head = @head.get
return nil unless old_head
next_node = old_head.next
if @head.compare_and_set(old_head, next_node)
@size.decrement
return old_head.value
end
# Retry if another thread modified head
end
end
def peek
head = @head.get
head ? head.value : nil
end
def size
@size.get
end
def empty?
@head.get.nil?
end
end
# Demonstrate concurrent access
stack = LockFreeStack.new
# Multiple producers
producers = 5.times.map do |i|
Thread.new do
100.times { |j| stack.push("#{i}-#{j}") }
end
end
# Multiple consumers
consumed = Concurrent::Array.new
consumers = 5.times.map do
Thread.new do
50.times do
value = stack.pop
consumed << value if value
end
end
end
producers.each(&:join)
consumers.each(&:join)
puts "Stack size: #{stack.size}"
puts "Consumed items: #{consumed.size}"
Lock-free queues require more complexity than stacks due to maintaining both head and tail pointers. The Michael-Scott queue algorithm uses atomic operations on both ends, with dummy nodes to simplify boundary conditions. Enqueue adds nodes at the tail, dequeue removes from the head, and both operations require CAS loops with careful ordering.
class LockFreeQueue
Node = Struct.new(:value, :next) do
def initialize(value, next_node = nil)
super(value, Concurrent::AtomicReference.new(next_node))
end
end
def initialize
dummy = Node.new(nil)
@head = Concurrent::AtomicReference.new(dummy)
@tail = Concurrent::AtomicReference.new(dummy)
end
def enqueue(value)
new_node = Node.new(value)
loop do
tail = @tail.get
next_node = tail.next.get
# Check if tail is still consistent
if tail == @tail.get
if next_node.nil?
# Try to link new node at end of list
if tail.next.compare_and_set(nil, new_node)
# Enqueue done, try to swing tail to new node
@tail.compare_and_set(tail, new_node)
return
end
else
# Tail is behind, try to advance it
@tail.compare_and_set(tail, next_node)
end
end
end
end
def dequeue
loop do
head = @head.get
tail = @tail.get
next_node = head.next.get
# Check if head is still consistent
if head == @head.get
if head == tail
# Queue is empty or tail is behind
return nil if next_node.nil?
# Tail is behind, try to advance it
@tail.compare_and_set(tail, next_node)
else
# Read value before CAS
value = next_node.value
# Try to swing head to next node
if @head.compare_and_set(head, next_node)
return value
end
end
end
end
end
def empty?
head = @head.get
tail = @tail.get
next_node = head.next.get
head == tail && next_node.nil?
end
end
# Producer-consumer pattern
queue = LockFreeQueue.new
# Producers add items
producers = 3.times.map do |producer_id|
Thread.new do
100.times do |i|
queue.enqueue("P#{producer_id}-#{i}")
sleep(0.001)
end
end
end
# Consumers process items
results = Concurrent::Array.new
consumers = 3.times.map do
Thread.new do
loop do
item = queue.dequeue
break unless item
results << item
sleep(0.001)
end
end
end
producers.each(&:join)
sleep(0.5) # Let consumers drain queue
consumers.each(&:kill)
puts "Processed #{results.size} items"
Lock-free object pools manage reusable objects without locking, useful for reducing allocation overhead in performance-critical code. The pool maintains a stack of available objects, with acquire operations popping from the stack and release operations pushing back. The structure combines lock-free stack principles with object lifecycle management.
class LockFreeObjectPool
Node = Struct.new(:object, :next)
def initialize(size, &factory)
@head = Concurrent::AtomicReference.new(nil)
@factory = factory
@available = Concurrent::AtomicFixnum.new(0)
# Pre-populate pool
size.times { release(factory.call) }
end
def acquire
loop do
old_head = @head.get
return @factory.call unless old_head
next_node = old_head.next
if @head.compare_and_set(old_head, next_node)
@available.decrement
return old_head.object
end
end
end
def release(object)
loop do
old_head = @head.get
new_node = Node.new(object, old_head)
if @head.compare_and_set(old_head, new_node)
@available.increment
return
end
end
end
def available_count
@available.get
end
end
# Example: Connection pool
ConnectionStub = Struct.new(:id) do
def reset
# Reset connection state
end
end
pool = LockFreeObjectPool.new(10) { |i| ConnectionStub.new(i) }
threads = 20.times.map do
Thread.new do
100.times do
conn = pool.acquire
# Use connection
sleep(0.01)
conn.reset
pool.release(conn)
end
end
end
threads.each(&:join)
puts "Pool size: #{pool.available_count}"
Design Considerations
Lock-free programming makes sense when lock contention significantly degrades performance and the algorithm complexity remains manageable. High contention scenarios with short critical sections benefit most, as the overhead of acquiring locks dominates execution time. Low contention workloads see minimal benefit and may perform worse due to retry overhead.
Traditional locks provide simpler reasoning and implementation at the cost of potential blocking. A thread acquiring a lock guarantees exclusive access, making state changes straightforward to reason about. Lock-free algorithms require understanding memory ordering, ABA problems, and linearization points, significantly increasing implementation complexity. Choose locks for complex state transitions or when correctness matters more than maximum throughput.
The presence of garbage collection strongly influences lock-free algorithm viability. Languages with GC like Ruby automatically solve memory reclamation problems that plague manual memory management. Without GC, implementing safe lock-free structures requires sophisticated techniques like hazard pointers or epoch-based reclamation, adding substantial complexity. Manual memory management often tips the balance toward simpler lock-based approaches.
Real-time systems and latency-sensitive applications benefit from lock-free guarantees. Locks can cause unbounded delays through priority inversion or preemption of lock holders. Lock-free algorithms guarantee system-wide progress regardless of thread scheduling, eliminating worst-case blocking scenarios. However, individual operations may still experience high latency due to repeated retries under extreme contention.
Algorithm composability differs significantly between approaches. Multiple lock-based operations compose through acquiring multiple locks in consistent order, preventing deadlock through lock ordering disciplines. Lock-free operations compose poorly; combining multiple CAS operations into a larger atomic operation typically requires redesigning the entire algorithm. This limitation restricts lock-free techniques to isolated data structures rather than complex transactional scenarios.
Debugging complexity increases dramatically with lock-free code. Race conditions manifest inconsistently and only under specific timing conditions, making reproduction difficult. Traditional debugging tools designed for sequential code provide limited insight into concurrent interactions. Testing requires stress testing with multiple threads and intentional scheduling perturbations to expose bugs that might occur rarely in production.
# Lock-based approach: Simple but blocks threads
class LockedCache
def initialize
@cache = {}
@mutex = Mutex.new
end
def get(key)
@mutex.synchronize { @cache[key] }
end
def put(key, value)
@mutex.synchronize { @cache[key] = value }
end
end
# Lock-free approach: More complex but non-blocking
class LockFreeCache
def initialize
@cache = Concurrent::AtomicReference.new({})
end
def get(key)
@cache.get[key]
end
def put(key, value)
loop do
old_map = @cache.get
new_map = old_map.merge(key => value)
break if @cache.compare_and_set(old_map, new_map)
end
end
end
Read-heavy workloads with infrequent writes favor lock-free structures using atomic references. Readers access the reference without synchronization, while writers create new structures and atomically swap them in. This copy-on-write pattern works well for configuration data, routing tables, or caches where reads vastly outnumber writes and stale reads remain acceptable for brief periods.
Write-heavy workloads require careful evaluation. Lock-free structures under high write contention experience many CAS failures and retries, potentially performing worse than well-designed locks. Coarse-grained locks may outperform lock-free alternatives when contention causes excessive retry overhead. Profile real workloads before committing to lock-free implementations for write-dominated scenarios.
Performance Considerations
Lock-free algorithms eliminate the overhead of lock acquisition and release, reducing latency in uncontended cases. A CAS operation executes in tens of nanoseconds, while lock operations require hundreds of nanoseconds for the synchronization overhead. This advantage disappears under contention when CAS operations repeatedly fail and retry, potentially performing worse than locks that queue waiting threads efficiently.
Cache coherence traffic dominates lock-free performance characteristics. Every CAS operation, successful or not, generates cache coherence messages to ensure atomic visibility across cores. Under high contention, multiple cores hammer the same cache line with CAS operations, causing cache line bouncing that saturates memory bandwidth. This coherence traffic often limits scalability more than the algorithm itself.
The retry penalty scales with contention levels. Each failed CAS wastes CPU cycles re-reading state and recomputing updates. Under extreme contention, threads spend more time retrying than making progress. Exponential backoff can reduce contention by having threads wait briefly after failures, trading latency for throughput, but determining optimal backoff strategies remains difficult.
# Measuring contention impact
require 'benchmark'
require 'concurrent'
def benchmark_counter(counter_class, thread_count, iterations)
counter = counter_class.new
Benchmark.measure do
threads = thread_count.times.map do
Thread.new do
iterations.times { counter.increment }
end
end
threads.each(&:join)
end
end
class LockedCounter
def initialize
@count = 0
@mutex = Mutex.new
end
def increment
@mutex.synchronize { @count += 1 }
end
end
class LockFreeCounter
def initialize
@count = Concurrent::AtomicFixnum.new(0)
end
def increment
@count.increment
end
end
# Low contention: 2 threads
puts "Low contention (2 threads):"
puts "Locked: #{benchmark_counter(LockedCounter, 2, 100_000)}"
puts "Lock-free: #{benchmark_counter(LockFreeCounter, 2, 100_000)}"
# High contention: 8 threads
puts "\nHigh contention (8 threads):"
puts "Locked: #{benchmark_counter(LockedCounter, 8, 100_000)}"
puts "Lock-free: #{benchmark_counter(LockFreeCounter, 8, 100_000)}"
Memory ordering requirements impose performance costs. Sequential consistency provides simplest semantics but requires expensive memory barriers. Acquire-release semantics reduce overhead by allowing some reordering while maintaining synchronization points. Relaxed ordering offers maximum performance but requires expert understanding to use correctly. Ruby's atomic operations typically use acquire-release semantics as a reasonable balance.
False sharing severely degrades multi-threaded performance when unrelated variables share cache lines. Two threads modifying separate variables on the same cache line invalidate each other's caches repeatedly, serializing what should be parallel operations. Lock-free structures must pad shared state to separate cache lines, wasting memory to preserve performance.
# False sharing mitigation through padding
class PaddedCounter
CACHE_LINE_SIZE = 64
# Pad to separate cache lines
attr_accessor :count, :_pad1, :_pad2, :_pad3, :_pad4, :_pad5, :_pad6, :_pad7
def initialize
@count = Concurrent::AtomicFixnum.new(0)
# Padding ensures count sits on its own cache line
end
end
Read scaling differs significantly from write scaling. Lock-free structures using atomic loads scale reads to arbitrary thread counts since reads don't modify cache lines. Write operations scale poorly due to cache coherence requirements. Designs should separate read and write paths, using techniques like per-thread counters that aggregate periodically to achieve better write scalability.
Specialized hardware instructions beyond CAS can improve performance. Double-width CAS (DCAS) atomically updates two memory locations, simplifying some algorithms. Load-linked/store-conditional (LL/SC) primitives avoid ABA problems inherently. However, portability suffers when relying on platform-specific instructions, and abstractions like Ruby's concurrent-ruby hide these details, providing consistent but potentially suboptimal performance across platforms.
Measuring lock-free performance requires careful methodology. Microbenchmarks often show lock-free advantages that disappear in real applications. Contention patterns, memory allocation rates, and system load all affect relative performance. Always benchmark with realistic workloads on target hardware under expected load conditions before concluding lock-free provides benefits.
Common Pitfalls
The ABA problem corrupts data structures when values change and revert between checks. A thread reads value A, intending to CAS it to C. Another thread changes A to B then back to A. The first thread's CAS succeeds despite missing the B state, potentially violating invariants. Solutions include tagged pointers that increment version counters with each change, making A-B-A transitions distinguishable.
# ABA problem in stack
class VulnerableStack
Node = Struct.new(:value, :next)
def initialize
@head = Concurrent::AtomicReference.new(nil)
end
def pop
loop do
old_head = @head.get
return nil unless old_head
next_node = old_head.next
# Between these lines, another thread might:
# 1. Pop this node
# 2. Pop another node
# 3. Push this node back
# Now old_head points to reused node with different next
return old_head.value if @head.compare_and_set(old_head, next_node)
end
end
end
# ABA-safe version with version counter
class SafeStack
class VersionedNode
attr_reader :value, :next, :version
def initialize(value, next_node, version = 0)
@value = value
@next = next_node
@version = version
end
end
def initialize
@head = Concurrent::AtomicReference.new(nil)
end
def pop
loop do
old_head = @head.get
return nil unless old_head
next_node = old_head.next
# Version counter makes reused nodes distinguishable
new_next = next_node ? VersionedNode.new(
next_node.value,
next_node.next,
next_node.version + 1
) : nil
return old_head.value if @head.compare_and_set(old_head, new_next)
end
end
end
Memory ordering violations cause subtle bugs when operations appear reordered. Compilers and processors reorder operations for optimization unless explicit ordering constraints exist. Reading a flag then accessing data it protects may see stale data if the read reorders before initialization completes. Ruby's atomic operations provide sufficient ordering for most use cases, but understanding the guarantees remains critical.
Livelock occurs when threads continuously retry operations without making progress. Under extreme contention, threads may spin indefinitely retrying CAS operations that repeatedly fail. Unlike deadlock where threads block permanently, livelocked threads consume CPU without useful work. Exponential backoff or yielding after failed attempts can mitigate livelock by reducing contention temporarily.
# Livelock prevention with backoff
class BackoffCounter
def initialize
@count = Concurrent::AtomicFixnum.new(0)
end
def increment
attempts = 0
loop do
current = @count.get
return if @count.compare_and_set(current, current + 1)
# Exponential backoff after failures
attempts += 1
if attempts > 3
sleep(2 ** (attempts - 3) * 0.001) # 1ms, 2ms, 4ms...
end
end
end
end
Reference equality vs value equality confusion causes CAS failures. Ruby's compare_and_set uses reference equality (equal?) not value equality (==). Two arrays containing identical elements fail comparison if they're different objects. This behavior prevents accidental matches but requires careful handling of value types.
atomic_ref = Concurrent::AtomicReference.new([1, 2, 3])
# This fails: different array object with same values
atomic_ref.compare_and_set([1, 2, 3], [4, 5, 6]) # => false
# This succeeds: same array object
old_array = atomic_ref.get
atomic_ref.compare_and_set(old_array, [4, 5, 6]) # => true
Neglecting memory visibility leads to infinite loops or stale reads. Without proper synchronization, one thread's writes may never become visible to other threads. While Ruby's GIL masks some visibility issues in MRI, code must remain correct for GIL-free implementations. Always use atomic operations or proper synchronization for shared state.
Incorrect retry logic corrupts state when operations have side effects. If the block passed to an atomic update has side effects beyond the returned value, retries execute those effects multiple times. Keep update blocks pure, computing new state from current state without external effects.
counter = Concurrent::AtomicFixnum.new(0)
log = []
# Wrong: side effects in update block
counter.update do |v|
log << "Update: #{v}" # Executes multiple times on retry
v + 1
end
# Right: separate side effects from update
old_value = counter.get
counter.update { |v| v + 1 }
log << "Update: #{old_value}"
Unbounded retry loops waste CPU under contention. Operations that continuously retry without success consume resources without progress. Implement retry limits or backoff strategies to prevent resource exhaustion. Consider falling back to locking if retry counts exceed thresholds indicating pathological contention.
Testing challenges hide concurrency bugs. Lock-free code may work correctly under low thread counts but fail under production load. Race conditions appear rarely and non-deterministically. Stress tests with high thread counts, deliberate scheduling jitter, and memory sanitizers help expose issues, but cannot prove correctness. Formal verification or careful reasoning about linearization points provides stronger guarantees than testing alone.
Reference
Atomic Operations
| Operation | Description | Use Case |
|---|---|---|
| get | Read current value | Snapshot shared state |
| set | Write new value | Initialize or reset state |
| compare_and_set | Atomic CAS operation | Core lock-free primitive |
| get_and_set | Swap with new value | Reset with old value retrieval |
| update | Retry block until success | Atomic transformations |
| increment | Atomic add 1 | Counters and statistics |
| decrement | Atomic subtract 1 | Reference counting |
Progress Guarantees
| Guarantee | Definition | Trade-offs |
|---|---|---|
| Wait-free | Every thread completes in bounded steps | Most complex, highest overhead |
| Lock-free | System makes progress, individual threads may starve | Balanced complexity and guarantees |
| Obstruction-free | Progress when thread runs alone | Simplest, weakest guarantees |
| Blocking | Uses locks, threads may wait indefinitely | Simplest implementation, blocking delays |
Memory Ordering Models
| Model | Visibility Guarantee | Performance | Complexity |
|---|---|---|---|
| Sequential Consistency | Total order across threads | Slowest | Simplest reasoning |
| Acquire-Release | Synchronization at marked points | Moderate | Moderate reasoning |
| Relaxed | Minimal ordering constraints | Fastest | Expert-level reasoning |
Common Lock-Free Data Structures
| Structure | Complexity | ABA Vulnerable | Best Use Case |
|---|---|---|---|
| Counter | Simple | No | Metrics and statistics |
| Stack | Simple | Yes | LIFO work queues |
| Queue | Moderate | Yes | FIFO work distribution |
| Hash Table | Complex | Yes | Concurrent lookups |
| Skip List | Complex | Yes | Ordered concurrent access |
Concurrent-Ruby Atomic Types
| Class | Purpose | Key Methods |
|---|---|---|
| AtomicFixnum | Thread-safe integer | increment, decrement, update |
| AtomicBoolean | Thread-safe boolean | make_true, make_false, value |
| AtomicReference | Thread-safe object reference | get, set, compare_and_set |
| AtomicMarkableReference | Reference with boolean mark | get, set, compare_and_set |
ABA Problem Mitigation
| Technique | Mechanism | Trade-offs |
|---|---|---|
| Garbage Collection | GC prevents address reuse | Language-dependent |
| Version Counters | Tag values with monotonic counter | Increased memory, counter overflow |
| Hazard Pointers | Announce accessed pointers | Manual memory management complexity |
| Epoch-Based Reclamation | Batch reclamation by epoch | Delayed reclamation, memory overhead |
Performance Characteristics
| Scenario | Lock-Based | Lock-Free | Winner |
|---|---|---|---|
| Low contention | Fast | Faster | Lock-free |
| High read contention | Moderate | Fast | Lock-free |
| High write contention | Moderate | Slow | Lock-based |
| Complex transactions | Fast | Very complex | Lock-based |
| Predictable latency | Variable | Variable with retries | Depends on contention |
| Simple operations | Overhead | Minimal overhead | Lock-free |
Debugging Checklist
| Check | Issue Detected | Resolution |
|---|---|---|
| Reference vs value equality | CAS failing unexpectedly | Use reference equality consistently |
| Side effects in update blocks | Duplicate effects on retry | Move side effects outside update |
| Memory visibility | Infinite loops, stale reads | Use proper atomic operations |
| False sharing | Poor multi-core scaling | Pad variables to separate cache lines |
| Livelock | High CPU, no progress | Add backoff or retry limits |
| ABA corruption | Rare data structure corruption | Use version counters or hazard pointers |
Ruby Implementation Decision Matrix
| Scenario | Recommendation | Rationale |
|---|---|---|
| Simple counter/flag | AtomicFixnum/AtomicBoolean | Built-in, optimal performance |
| Reference swapping | AtomicReference | Clean semantics for object references |
| Complex state | Lock-based with Mutex | Simpler reasoning, less error-prone |
| High read, low write | Atomic with copy-on-write | Readers never block |
| Queue/Stack | Concurrent collections or custom lock-free | Depends on performance requirements |
| MRI production code | Consider lock-based | GIL limits parallelism benefits |
| JRuby/TruffleRuby | Lock-free where appropriate | True parallelism enables benefits |