Overview
A race condition occurs when multiple threads or processes access shared resources concurrently, and the final state depends on the precise timing and ordering of execution. The outcome becomes non-deterministic, varying between executions despite identical inputs. This unpredictability makes race conditions among the most challenging bugs to detect and fix in concurrent systems.
The term originates from hardware design, where multiple signals "race" to affect a circuit's state. In software, threads race to read, modify, or write shared data. The thread that wins the race determines the result, creating behavior that changes between runs.
Race conditions manifest in various forms. A classic example involves two threads incrementing a shared counter:
counter = 0
Thread.new { 10_000.times { counter += 1 } }
Thread.new { 10_000.times { counter += 1 } }
sleep 1
puts counter # Expected: 20000, Actual: varies (15432, 18901, etc.)
The increment operation counter += 1 decomposes into three steps: read the current value, add one, write the result. When threads interleave these steps, updates get lost. Thread A reads 5, Thread B reads 5, both write 6, and one increment disappears.
Race conditions cause data corruption, incorrect calculations, security vulnerabilities, and application crashes. Unlike deterministic bugs that reproduce consistently, race conditions appear sporadically, often only under specific load conditions or hardware configurations. A system may pass all tests yet fail catastrophically in production when timing aligns differently.
The consequences extend beyond incorrect values. Race conditions in authentication systems grant unauthorized access. Race conditions in financial transactions duplicate payments or lose funds. Race conditions in file operations corrupt data or expose sensitive information. The non-deterministic nature makes diagnosis difficult, as adding logging or debugging often changes timing enough to hide the condition.
Key Principles
Race conditions arise from the intersection of three factors: shared mutable state, concurrent execution, and non-atomic operations. Understanding these foundations clarifies why race conditions occur and how to prevent them.
Shared Mutable State exists when multiple execution contexts access the same memory location or resource. This includes variables in shared memory, files on disk, database records, or network sockets. Immutable data eliminates race conditions regardless of concurrency, since no thread modifies state. The risk emerges only when threads both read and write the same location.
Concurrent Execution means multiple threads or processes execute simultaneously or with interleaved operations. On single-core systems, the operating system rapidly switches between threads, creating the appearance of simultaneity. On multi-core systems, threads truly execute in parallel. Both scenarios create opportunities for race conditions when threads access shared state.
Non-Atomic Operations break into multiple steps at the hardware or runtime level. An atomic operation completes entirely or not at all, with no intermediate states visible to other threads. The increment x += 1 looks atomic in source code but compiles to separate load, increment, and store instructions. Between any two instructions, the scheduler may switch threads, allowing another thread to see or modify inconsistent state.
Critical Sections represent code segments that access shared resources and must not execute concurrently. When one thread enters a critical section, other threads must wait. The critical section should be minimal—only the operations that actually require mutual exclusion. Expanding critical sections unnecessarily reduces parallelism and degrades performance.
Mutual Exclusion prevents multiple threads from simultaneously executing critical sections. Synchronization primitives like mutexes, semaphores, and monitors implement mutual exclusion. A thread acquires a lock before entering a critical section and releases it upon exit. Other threads attempting to acquire the same lock block until it becomes available.
Memory Visibility concerns when changes made by one thread become visible to others. Modern processors and compilers reorder operations for performance, potentially making writes visible out of order. Without proper synchronization, one thread may never observe another thread's updates. Memory barriers and synchronization primitives enforce visibility guarantees.
Atomicity Guarantees specify which operations complete indivisibly. Hardware provides atomic operations for primitive types at the word size (32 or 64 bits). Higher-level atomic operations require explicit synchronization. Ruby's Global Interpreter Lock provides some atomicity for specific operations but does not eliminate all race conditions.
The Check-Then-Act pattern exemplifies race conditions. Code checks a condition, then acts on the result:
if @balance >= amount
@balance -= amount # Balance may have changed since check
end
Between the check and the action, another thread may modify the balance, invalidating the check. The solution combines check and action into a single atomic operation protected by synchronization.
Ruby Implementation
Ruby's threading model creates specific considerations for race conditions. The Global Interpreter Lock (GIL) in MRI Ruby prevents multiple Ruby threads from executing simultaneously, but this does not eliminate race conditions. The GIL protects the Ruby interpreter's internal state, not application-level data structures. Thread switches occur at any point, including mid-operation, allowing race conditions to emerge.
Thread Creation and Management in Ruby uses the Thread class:
threads = 10.times.map do
Thread.new do
# Work that may access shared state
end
end
threads.each(&:join) # Wait for all threads to complete
Each thread executes independently with access to shared variables in the outer scope. Without synchronization, concurrent access to these variables creates race conditions.
Mutex Synchronization provides mutual exclusion through the Mutex class:
class BankAccount
def initialize(balance)
@balance = balance
@mutex = Mutex.new
end
def withdraw(amount)
@mutex.synchronize do
if @balance >= amount
@balance -= amount
amount
else
0
end
end
end
def balance
@mutex.synchronize { @balance }
end
end
account = BankAccount.new(1000)
threads = 100.times.map do
Thread.new { account.withdraw(10) }
end
threads.each(&:join)
puts account.balance # Consistent: 0
The synchronize block ensures only one thread executes the critical section at a time. Both reads and writes require synchronization—reading without a lock observes stale or inconsistent values.
Thread-Safe Data Structures prevent common race conditions. Ruby provides Queue and SizedQueue for thread-safe producer-consumer patterns:
queue = Queue.new
producer = Thread.new do
10.times do |i|
queue << i
sleep 0.1
end
end
consumer = Thread.new do
10.times do
value = queue.pop # Blocks until item available
puts "Processed: #{value}"
end
end
[producer, consumer].each(&:join)
The Queue handles synchronization internally, eliminating race conditions in enqueue and dequeue operations.
Condition Variables coordinate threads beyond simple mutual exclusion. A condition variable allows threads to wait for specific conditions while releasing locks:
class ThreadSafeQueue
def initialize
@items = []
@mutex = Mutex.new
@resource_available = ConditionVariable.new
end
def push(item)
@mutex.synchronize do
@items << item
@resource_available.signal # Wake one waiting thread
end
end
def pop
@mutex.synchronize do
while @items.empty?
@resource_available.wait(@mutex) # Release lock and wait
end
@items.shift
end
end
end
The wait method atomically releases the mutex and blocks the thread. When signaled, it reacquires the mutex before returning. This prevents race conditions in complex synchronization patterns.
Thread-Local Storage isolates state per thread, avoiding shared state entirely:
Thread.current[:request_id] = SecureRandom.uuid
# Each thread has its own request_id
10.times.map do
Thread.new do
Thread.current[:request_id] = SecureRandom.uuid
# Use Thread.current[:request_id] without conflicts
end
end
Thread-local variables eliminate race conditions for per-thread state but do not help with genuinely shared resources.
Atomic Operations in Ruby require the concurrent-ruby gem for proper atomic primitives:
require 'concurrent'
counter = Concurrent::AtomicFixnum.new(0)
threads = 10.times.map do
Thread.new do
10_000.times { counter.increment }
end
end
threads.each(&:join)
puts counter.value # Always 100000
The AtomicFixnum uses compare-and-swap operations at the hardware level, providing true atomicity without locks.
Common Pitfalls
Race conditions hide in unexpected places, often masked by timing or system behavior that changes between environments. Several patterns consistently produce race conditions.
Lazy Initialization Without Synchronization creates race conditions when multiple threads initialize a resource:
class DatabaseConnection
def connection
@connection ||= establish_connection
end
end
The ||= operator performs check-then-act: check if @connection is nil, then assign if needed. Multiple threads may see nil simultaneously, each establishing a separate connection. The solution uses a mutex or eager initialization:
class DatabaseConnection
def initialize
@mutex = Mutex.new
end
def connection
return @connection if @connection
@mutex.synchronize do
@connection ||= establish_connection
end
end
end
The double-checked locking pattern checks before acquiring the lock for performance, then checks again inside the synchronized block.
Compound Operations on Collections expose race conditions even with synchronized collections:
@cache = {}
@mutex = Mutex.new
def get_or_compute(key)
@mutex.synchronize do
if @cache.key?(key)
@cache[key]
else
value = expensive_computation(key)
@cache[key] = value
value
end
end
end
This appears safe but creates problems if expensive_computation is slow. The mutex blocks all threads, destroying concurrency. A better approach uses finer-grained locking or concurrent data structures that handle this pattern efficiently.
Inconsistent Lock Ordering causes deadlocks, a related concurrency problem:
# Thread 1
@lock_a.synchronize do
@lock_b.synchronize do
# Critical section
end
end
# Thread 2
@lock_b.synchronize do
@lock_a.synchronize do
# Critical section
end
end
Thread 1 holds lock A and waits for lock B. Thread 2 holds lock B and waits for lock A. Neither can proceed. The solution acquires locks in a consistent order across all threads.
False Safety from Ruby's GIL leads developers to assume thread safety where none exists:
@counter = 0
100.times.map do
Thread.new { 10_000.times { @counter += 1 } }
end.each(&:join)
# @counter is less than 1,000,000 despite GIL
The GIL prevents parallel execution but does not prevent thread interleaving within operations. The increment still decomposes into read-modify-write steps, allowing race conditions.
Time-of-Check to Time-of-Use (TOCTOU) separates validation from action:
if File.exist?(path) && !File.directory?(path)
contents = File.read(path) # File may change between check and read
end
The file system state may change between the check and the read. An attacker could replace the file with a symlink pointing to sensitive data. The solution performs operations atomically or validates after reading.
Non-Obvious Shared State occurs when objects appear independent but share internal state:
original = [1, 2, 3]
duplicate = original.dup
Thread.new { duplicate.map! { |x| x * 2 } }
Thread.new { original.map! { |x| x * 3 } }
# Race condition if dup is shallow
The dup method creates a shallow copy. If array elements are mutable objects, both arrays share those objects, creating race conditions on the shared elements.
Error Handling & Edge Cases
Detecting and recovering from race conditions requires specific strategies since traditional debugging techniques often fail. Race conditions produce symptoms—incorrect data, crashes, hangs—without obvious causes in stack traces or logs.
Detection Through Invariant Violations catches race conditions by checking data structure consistency. Define invariants that must always hold, then verify them:
class BankAccount
def initialize(balance)
@balance = balance
@transaction_log = []
@mutex = Mutex.new
end
def deposit(amount)
@mutex.synchronize do
@balance += amount
@transaction_log << { type: :deposit, amount: amount }
verify_invariants
end
end
def withdraw(amount)
@mutex.synchronize do
@balance -= amount
@transaction_log << { type: :withdraw, amount: amount }
verify_invariants
end
end
private
def verify_invariants
computed_balance = @transaction_log.sum do |tx|
tx[:type] == :deposit ? tx[:amount] : -tx[:amount]
end
raise "Balance invariant violated" unless computed_balance == @balance
end
end
Invariant checks catch race conditions immediately rather than allowing corruption to propagate. In production, log violations instead of raising exceptions.
Stress Testing increases the probability of triggering race conditions by maximizing thread contention:
def stress_test(iterations: 10_000, threads: 100)
errors = []
mutex = Mutex.new
thread_pool = threads.times.map do
Thread.new do
iterations.times do
begin
yield
rescue => e
mutex.synchronize { errors << e }
end
end
end
end
thread_pool.each(&:join)
errors
end
# Usage
errors = stress_test do
account.withdraw(1)
end
if errors.any?
puts "Race condition detected: #{errors.size} failures"
end
Stress testing amplifies timing windows, making intermittent race conditions more likely to manifest. Run tests repeatedly with varying thread counts and delays.
Deterministic Replay for race conditions uses thread scheduling control:
# This requires external tools or modified thread schedulers
# Conceptual approach:
class DeterministicScheduler
def initialize
@schedule = []
@current_step = 0
end
def record_schedule
Thread.current[:step] = @schedule.size
@schedule << Thread.current.object_id
end
def replay_schedule
expected_thread = @schedule[@current_step]
sleep 0.001 until Thread.current.object_id == expected_thread
@current_step += 1
end
end
In practice, deterministic replay requires specialized tools. The concurrent-ruby gem provides some facilities, or use external tools like rr or custom thread instrumentation.
Timeout Protection prevents indefinite hangs from deadlocks:
require 'timeout'
def safe_synchronized_operation
Timeout.timeout(5) do
@mutex.synchronize do
# Critical section
end
end
rescue Timeout::Error
logger.error "Operation timed out - possible deadlock"
raise
end
Timeouts detect but do not solve deadlocks. Log timeout events to identify patterns suggesting lock contention or deadlock scenarios.
Diagnostic Logging captures thread interleaving without changing timing significantly:
class ThreadSafeLogger
def initialize
@mutex = Mutex.new
end
def log(message)
@mutex.synchronize do
timestamp = Time.now.strftime("%Y-%m-%d %H:%M:%S.%6N")
thread_id = Thread.current.object_id
puts "[#{timestamp}] Thread #{thread_id}: #{message}"
end
end
end
logger = ThreadSafeLogger.new
def process_with_logging(item)
logger.log "Starting processing: #{item}"
# Process item
logger.log "Finished processing: #{item}"
end
Thread-safe logging reveals execution order across threads. Buffer logs in memory during test runs to minimize I/O impact on timing.
Graceful Degradation handles detected race conditions without crashing:
class RobustCache
def initialize
@cache = {}
@mutex = Mutex.new
@computation_locks = {}
end
def fetch(key)
return @cache[key] if @cache.key?(key)
computation_lock = @mutex.synchronize do
@computation_locks[key] ||= Mutex.new
end
computation_lock.synchronize do
return @cache[key] if @cache.key?(key)
begin
value = yield
@cache[key] = value
rescue => e
logger.error "Cache computation failed: #{e.message}"
nil # Return nil rather than propagating error
end
end
end
end
Per-key locks prevent multiple threads from computing the same value while allowing concurrent computation of different keys.
Security Implications
Race conditions create security vulnerabilities beyond data corruption. Attackers exploit timing windows to bypass security checks, escalate privileges, or access unauthorized resources.
Time-of-Check Time-of-Use (TOCTOU) Attacks exploit the gap between security checks and resource access:
# Vulnerable code
def process_file(user_id, filename)
path = "/uploads/#{user_id}/#{filename}"
if File.exist?(path) && File.owned?(path) && !File.symlink?(path)
# Security check passes
sleep 0.1 # Simulated processing delay
contents = File.read(path) # File may have changed
process(contents)
end
end
An attacker replaces the file with a symlink to /etc/passwd between the check and read. The application reads sensitive data despite security checks. Solutions include:
def process_file_safely(user_id, filename)
path = "/uploads/#{user_id}/#{filename}"
File.open(path, File::RDONLY | File::NOFOLLOW) do |file|
stat = file.stat
if stat.owned? && !stat.symlink?
contents = file.read
process(contents)
end
end
rescue Errno::ELOOP
# Symlink detected
raise SecurityError, "Symlink not allowed"
end
Opening the file with NOFOLLOW prevents symlink following. Checking ownership on the open file descriptor eliminates the TOCTOU window.
Session Race Conditions in authentication systems allow session hijacking:
# Vulnerable session check
class SessionManager
def validate_and_use_session(session_id)
session = @sessions[session_id]
if session && session[:expires_at] > Time.now
# Valid session
session[:expires_at] = Time.now + 3600
return session[:user_id]
end
nil
end
end
Multiple requests with the same session ID may validate simultaneously before one invalidates it. An attacker steals a session ID, races to use it before the victim's request invalidates it. The solution uses atomic operations:
class SecureSessionManager
def initialize
@sessions = {}
@mutex = Mutex.new
end
def validate_and_use_session(session_id)
@mutex.synchronize do
session = @sessions[session_id]
return nil unless session
return nil if session[:expires_at] <= Time.now
# Atomically extend expiration within lock
session[:expires_at] = Time.now + 3600
session[:user_id]
end
end
end
Privilege Escalation Through Race Conditions occurs when authorization checks race with state changes:
class DocumentAccess
def grant_temporary_access(user_id, document_id, duration)
@temporary_access[user_id] ||= []
@temporary_access[user_id] << {
document: document_id,
expires: Time.now + duration
}
end
def can_access?(user_id, document_id)
access_grants = @temporary_access[user_id] || []
access_grants.any? do |grant|
grant[:document] == document_id && grant[:expires] > Time.now
end
end
end
# Vulnerable usage
if access_manager.can_access?(user_id, doc_id)
# Time passes - access may expire
document = Document.find(doc_id) # Race window
document.delete # User performs privileged operation
end
Between the authorization check and the privileged operation, access may expire or be revoked. An attacker rapidly requests operations hoping to win the race. The solution combines check and action atomically:
class SecureDocumentAccess
def perform_with_authorization(user_id, document_id)
@mutex.synchronize do
unless can_access_locked?(user_id, document_id)
raise UnauthorizedError
end
yield # Execute operation within lock
end
end
private
def can_access_locked?(user_id, document_id)
# Authorization check within lock
end
end
# Usage
access_manager.perform_with_authorization(user_id, doc_id) do
Document.find(doc_id).delete
end
Double-Spend Attacks in financial systems exploit race conditions:
class Wallet
def spend(user_id, amount)
balance = get_balance(user_id)
if balance >= amount
# Race window: multiple spends may pass this check
new_balance = balance - amount
set_balance(user_id, new_balance)
true
else
false
end
end
end
An attacker submits multiple simultaneous spend requests. Each reads the same balance, passes the check, and subtracts, spending more than the balance allows. The solution uses database transactions with row-level locking or compare-and-swap operations.
Testing Approaches
Testing race conditions requires techniques beyond standard unit tests. Race conditions may not manifest in sequential test execution but appear only under specific concurrent scenarios.
Property-Based Testing verifies invariants hold across random thread interleavings:
require 'concurrent'
def test_counter_invariant
counter = ThreadSafeCounter.new(0)
100.times do
expected_final = rand(1000..5000)
increments_per_thread = expected_final / 10
threads = 10.times.map do
Thread.new do
increments_per_thread.times { counter.increment }
end
end
threads.each(&:join)
assert_equal expected_final, counter.value,
"Counter invariant violated: expected #{expected_final}, got #{counter.value}"
counter.reset
end
end
Run property tests many times with varying parameters to explore different execution paths. The random variation increases the probability of triggering race conditions.
Interleaving Injection manually forces specific thread execution orders:
class InterleavingTest
def setup
@step_mutex = Mutex.new
@step_cv = ConditionVariable.new
@current_step = 0
end
def wait_for_step(expected_step)
@step_mutex.synchronize do
@step_cv.wait(@step_mutex) until @current_step == expected_step
end
end
def advance_step
@step_mutex.synchronize do
@current_step += 1
@step_cv.broadcast
end
end
def test_specific_interleaving
shared_value = 0
thread1 = Thread.new do
wait_for_step(0)
temp = shared_value
advance_step
wait_for_step(2)
shared_value = temp + 1
advance_step
end
thread2 = Thread.new do
wait_for_step(1)
temp = shared_value
advance_step
wait_for_step(3)
shared_value = temp + 1
advance_step
end
advance_step # Start step 0
[thread1, thread2].each(&:join)
assert_equal 1, shared_value, "Race condition: lost update detected"
end
end
This technique tests specific interleavings that trigger known race conditions, confirming fixes prevent them.
Fuzzing Concurrent Operations randomly varies operation order and timing:
def concurrent_fuzz_test(operations, iterations: 1000, threads: 10)
iterations.times do
thread_pool = threads.times.map do
Thread.new do
operation = operations.sample
sleep(rand * 0.001) # Random small delay
operation.call
end
end
thread_pool.each(&:join)
yield if block_given? # Verify invariants after each iteration
end
end
# Usage
account = BankAccount.new(1000)
operations = [
-> { account.deposit(10) },
-> { account.withdraw(10) },
-> { account.balance }
]
concurrent_fuzz_test(operations) do
assert account.balance >= 0, "Negative balance detected"
end
Fuzzing explores a wide range of concurrent scenarios, increasing coverage of possible race conditions.
Happens-Before Verification checks that operations occur in required order:
class HappensBeforeTracker
def initialize
@events = []
@mutex = Mutex.new
end
def record(event)
@mutex.synchronize do
@events << { event: event, thread: Thread.current.object_id, time: Time.now }
end
end
def verify_order(before_event, after_event)
before_time = @events.find { |e| e[:event] == before_event }&.fetch(:time)
after_time = @events.find { |e| e[:event] == after_event }&.fetch(:time)
raise "Event #{before_event} must happen before #{after_event}" if
before_time && after_time && before_time > after_time
end
end
# Usage in test
tracker = HappensBeforeTracker.new
thread1 = Thread.new do
initialize_resource
tracker.record(:initialized)
end
thread2 = Thread.new do
sleep 0.01 # Ensure thread1 starts first
use_resource
tracker.record(:used)
end
[thread1, thread2].each(&:join)
tracker.verify_order(:initialized, :used)
Static Analysis Integration detects potential race conditions without execution:
# While Ruby lacks comprehensive static analysis for race conditions,
# patterns can be checked using custom linters or code review
class RaceConditionLinter
def check_unprotected_access(ast)
shared_vars = find_shared_variables(ast)
shared_vars.each do |var|
accesses = find_variable_accesses(ast, var)
accesses.each do |access|
unless protected_by_mutex?(access)
warn "Potential race condition: unprotected access to #{var} at line #{access.line}"
end
end
end
end
end
Static analysis complements dynamic testing by identifying suspicious patterns during code review.
Reference
Thread Safety Guarantees
| Operation | Thread Safe | Notes |
|---|---|---|
| Variable read | No | May see stale values |
| Variable write | No | May interleave with other writes |
| Integer arithmetic | No | Not atomic despite GIL |
| Array append | No | Internal array operations not atomic |
| Hash access | No | May corrupt during concurrent writes |
| String concatenation | No | String mutation not thread-safe |
| Queue push/pop | Yes | Synchronized internally |
| Mutex lock/unlock | Yes | Provides mutual exclusion |
| Atomic operations | Yes | Hardware-level atomicity |
Ruby Concurrency Primitives
| Primitive | Purpose | Key Methods |
|---|---|---|
| Thread | Concurrent execution | new, join, kill, value |
| Mutex | Mutual exclusion | lock, unlock, synchronize |
| ConditionVariable | Thread coordination | wait, signal, broadcast |
| Queue | Thread-safe queue | push, pop, empty? |
| SizedQueue | Bounded queue | push, pop, max, num_waiting |
| Monitor | Object-level locking | synchronize, wait, signal |
Common Race Condition Patterns
| Pattern | Description | Prevention |
|---|---|---|
| Check-then-act | Validate then modify based on stale data | Atomic check-and-modify operation |
| Read-modify-write | Read value, compute, write result | Mutex around entire operation |
| Lazy initialization | Multiple threads initialize same resource | Double-checked locking or eager init |
| Lost update | Concurrent writes overwrite each other | Optimistic locking or mutex |
| Dirty read | Read partially updated data | Consistent locking strategy |
Detection Tools and Techniques
| Technique | Effectiveness | Overhead |
|---|---|---|
| Manual code review | Medium | Low |
| Stress testing | High | Medium |
| Property-based testing | High | Medium |
| Static analysis | Medium | Low |
| Runtime detection tools | High | High |
| Invariant checking | High | Medium |
Synchronization Strategy Selection
| Scenario | Recommended Approach | Alternative |
|---|---|---|
| Simple counter | AtomicFixnum | Mutex-protected variable |
| Producer-consumer | Queue or SizedQueue | ConditionVariable with array |
| Complex state machine | Mutex with state variable | Actor model library |
| Read-heavy data | Read-write lock | Immutable data with copy-on-write |
| Initialization once | Mutex with double-check | Eager initialization |
| Thread coordination | ConditionVariable | Sleep polling with backoff |
Thread-Safe Ruby Gems
| Gem | Purpose | Key Features |
|---|---|---|
| concurrent-ruby | Concurrency utilities | Atomic types, thread pools, futures |
| thread_safe | Thread-safe collections | Cache, array, hash implementations |
| celluloid | Actor-based concurrency | Supervision, async calls, timers |
| concurrent-ruby-ext | Native extensions | Performance-optimized atomic ops |
| monitor | Standard library monitor | Object-level synchronization |