CrackedRuby CrackedRuby

Overview

A race condition occurs when multiple threads or processes access shared resources concurrently, and the final state depends on the precise timing and ordering of execution. The outcome becomes non-deterministic, varying between executions despite identical inputs. This unpredictability makes race conditions among the most challenging bugs to detect and fix in concurrent systems.

The term originates from hardware design, where multiple signals "race" to affect a circuit's state. In software, threads race to read, modify, or write shared data. The thread that wins the race determines the result, creating behavior that changes between runs.

Race conditions manifest in various forms. A classic example involves two threads incrementing a shared counter:

counter = 0

Thread.new { 10_000.times { counter += 1 } }
Thread.new { 10_000.times { counter += 1 } }

sleep 1
puts counter  # Expected: 20000, Actual: varies (15432, 18901, etc.)

The increment operation counter += 1 decomposes into three steps: read the current value, add one, write the result. When threads interleave these steps, updates get lost. Thread A reads 5, Thread B reads 5, both write 6, and one increment disappears.

Race conditions cause data corruption, incorrect calculations, security vulnerabilities, and application crashes. Unlike deterministic bugs that reproduce consistently, race conditions appear sporadically, often only under specific load conditions or hardware configurations. A system may pass all tests yet fail catastrophically in production when timing aligns differently.

The consequences extend beyond incorrect values. Race conditions in authentication systems grant unauthorized access. Race conditions in financial transactions duplicate payments or lose funds. Race conditions in file operations corrupt data or expose sensitive information. The non-deterministic nature makes diagnosis difficult, as adding logging or debugging often changes timing enough to hide the condition.

Key Principles

Race conditions arise from the intersection of three factors: shared mutable state, concurrent execution, and non-atomic operations. Understanding these foundations clarifies why race conditions occur and how to prevent them.

Shared Mutable State exists when multiple execution contexts access the same memory location or resource. This includes variables in shared memory, files on disk, database records, or network sockets. Immutable data eliminates race conditions regardless of concurrency, since no thread modifies state. The risk emerges only when threads both read and write the same location.

Concurrent Execution means multiple threads or processes execute simultaneously or with interleaved operations. On single-core systems, the operating system rapidly switches between threads, creating the appearance of simultaneity. On multi-core systems, threads truly execute in parallel. Both scenarios create opportunities for race conditions when threads access shared state.

Non-Atomic Operations break into multiple steps at the hardware or runtime level. An atomic operation completes entirely or not at all, with no intermediate states visible to other threads. The increment x += 1 looks atomic in source code but compiles to separate load, increment, and store instructions. Between any two instructions, the scheduler may switch threads, allowing another thread to see or modify inconsistent state.

Critical Sections represent code segments that access shared resources and must not execute concurrently. When one thread enters a critical section, other threads must wait. The critical section should be minimal—only the operations that actually require mutual exclusion. Expanding critical sections unnecessarily reduces parallelism and degrades performance.

Mutual Exclusion prevents multiple threads from simultaneously executing critical sections. Synchronization primitives like mutexes, semaphores, and monitors implement mutual exclusion. A thread acquires a lock before entering a critical section and releases it upon exit. Other threads attempting to acquire the same lock block until it becomes available.

Memory Visibility concerns when changes made by one thread become visible to others. Modern processors and compilers reorder operations for performance, potentially making writes visible out of order. Without proper synchronization, one thread may never observe another thread's updates. Memory barriers and synchronization primitives enforce visibility guarantees.

Atomicity Guarantees specify which operations complete indivisibly. Hardware provides atomic operations for primitive types at the word size (32 or 64 bits). Higher-level atomic operations require explicit synchronization. Ruby's Global Interpreter Lock provides some atomicity for specific operations but does not eliminate all race conditions.

The Check-Then-Act pattern exemplifies race conditions. Code checks a condition, then acts on the result:

if @balance >= amount
  @balance -= amount  # Balance may have changed since check
end

Between the check and the action, another thread may modify the balance, invalidating the check. The solution combines check and action into a single atomic operation protected by synchronization.

Ruby Implementation

Ruby's threading model creates specific considerations for race conditions. The Global Interpreter Lock (GIL) in MRI Ruby prevents multiple Ruby threads from executing simultaneously, but this does not eliminate race conditions. The GIL protects the Ruby interpreter's internal state, not application-level data structures. Thread switches occur at any point, including mid-operation, allowing race conditions to emerge.

Thread Creation and Management in Ruby uses the Thread class:

threads = 10.times.map do
  Thread.new do
    # Work that may access shared state
  end
end

threads.each(&:join)  # Wait for all threads to complete

Each thread executes independently with access to shared variables in the outer scope. Without synchronization, concurrent access to these variables creates race conditions.

Mutex Synchronization provides mutual exclusion through the Mutex class:

class BankAccount
  def initialize(balance)
    @balance = balance
    @mutex = Mutex.new
  end

  def withdraw(amount)
    @mutex.synchronize do
      if @balance >= amount
        @balance -= amount
        amount
      else
        0
      end
    end
  end

  def balance
    @mutex.synchronize { @balance }
  end
end

account = BankAccount.new(1000)

threads = 100.times.map do
  Thread.new { account.withdraw(10) }
end

threads.each(&:join)
puts account.balance  # Consistent: 0

The synchronize block ensures only one thread executes the critical section at a time. Both reads and writes require synchronization—reading without a lock observes stale or inconsistent values.

Thread-Safe Data Structures prevent common race conditions. Ruby provides Queue and SizedQueue for thread-safe producer-consumer patterns:

queue = Queue.new

producer = Thread.new do
  10.times do |i|
    queue << i
    sleep 0.1
  end
end

consumer = Thread.new do
  10.times do
    value = queue.pop  # Blocks until item available
    puts "Processed: #{value}"
  end
end

[producer, consumer].each(&:join)

The Queue handles synchronization internally, eliminating race conditions in enqueue and dequeue operations.

Condition Variables coordinate threads beyond simple mutual exclusion. A condition variable allows threads to wait for specific conditions while releasing locks:

class ThreadSafeQueue
  def initialize
    @items = []
    @mutex = Mutex.new
    @resource_available = ConditionVariable.new
  end

  def push(item)
    @mutex.synchronize do
      @items << item
      @resource_available.signal  # Wake one waiting thread
    end
  end

  def pop
    @mutex.synchronize do
      while @items.empty?
        @resource_available.wait(@mutex)  # Release lock and wait
      end
      @items.shift
    end
  end
end

The wait method atomically releases the mutex and blocks the thread. When signaled, it reacquires the mutex before returning. This prevents race conditions in complex synchronization patterns.

Thread-Local Storage isolates state per thread, avoiding shared state entirely:

Thread.current[:request_id] = SecureRandom.uuid

# Each thread has its own request_id
10.times.map do
  Thread.new do
    Thread.current[:request_id] = SecureRandom.uuid
    # Use Thread.current[:request_id] without conflicts
  end
end

Thread-local variables eliminate race conditions for per-thread state but do not help with genuinely shared resources.

Atomic Operations in Ruby require the concurrent-ruby gem for proper atomic primitives:

require 'concurrent'

counter = Concurrent::AtomicFixnum.new(0)

threads = 10.times.map do
  Thread.new do
    10_000.times { counter.increment }
  end
end

threads.each(&:join)
puts counter.value  # Always 100000

The AtomicFixnum uses compare-and-swap operations at the hardware level, providing true atomicity without locks.

Common Pitfalls

Race conditions hide in unexpected places, often masked by timing or system behavior that changes between environments. Several patterns consistently produce race conditions.

Lazy Initialization Without Synchronization creates race conditions when multiple threads initialize a resource:

class DatabaseConnection
  def connection
    @connection ||= establish_connection
  end
end

The ||= operator performs check-then-act: check if @connection is nil, then assign if needed. Multiple threads may see nil simultaneously, each establishing a separate connection. The solution uses a mutex or eager initialization:

class DatabaseConnection
  def initialize
    @mutex = Mutex.new
  end

  def connection
    return @connection if @connection
    
    @mutex.synchronize do
      @connection ||= establish_connection
    end
  end
end

The double-checked locking pattern checks before acquiring the lock for performance, then checks again inside the synchronized block.

Compound Operations on Collections expose race conditions even with synchronized collections:

@cache = {}
@mutex = Mutex.new

def get_or_compute(key)
  @mutex.synchronize do
    if @cache.key?(key)
      @cache[key]
    else
      value = expensive_computation(key)
      @cache[key] = value
      value
    end
  end
end

This appears safe but creates problems if expensive_computation is slow. The mutex blocks all threads, destroying concurrency. A better approach uses finer-grained locking or concurrent data structures that handle this pattern efficiently.

Inconsistent Lock Ordering causes deadlocks, a related concurrency problem:

# Thread 1
@lock_a.synchronize do
  @lock_b.synchronize do
    # Critical section
  end
end

# Thread 2
@lock_b.synchronize do
  @lock_a.synchronize do
    # Critical section
  end
end

Thread 1 holds lock A and waits for lock B. Thread 2 holds lock B and waits for lock A. Neither can proceed. The solution acquires locks in a consistent order across all threads.

False Safety from Ruby's GIL leads developers to assume thread safety where none exists:

@counter = 0

100.times.map do
  Thread.new { 10_000.times { @counter += 1 } }
end.each(&:join)

# @counter is less than 1,000,000 despite GIL

The GIL prevents parallel execution but does not prevent thread interleaving within operations. The increment still decomposes into read-modify-write steps, allowing race conditions.

Time-of-Check to Time-of-Use (TOCTOU) separates validation from action:

if File.exist?(path) && !File.directory?(path)
  contents = File.read(path)  # File may change between check and read
end

The file system state may change between the check and the read. An attacker could replace the file with a symlink pointing to sensitive data. The solution performs operations atomically or validates after reading.

Non-Obvious Shared State occurs when objects appear independent but share internal state:

original = [1, 2, 3]
duplicate = original.dup

Thread.new { duplicate.map! { |x| x * 2 } }
Thread.new { original.map! { |x| x * 3 } }

# Race condition if dup is shallow

The dup method creates a shallow copy. If array elements are mutable objects, both arrays share those objects, creating race conditions on the shared elements.

Error Handling & Edge Cases

Detecting and recovering from race conditions requires specific strategies since traditional debugging techniques often fail. Race conditions produce symptoms—incorrect data, crashes, hangs—without obvious causes in stack traces or logs.

Detection Through Invariant Violations catches race conditions by checking data structure consistency. Define invariants that must always hold, then verify them:

class BankAccount
  def initialize(balance)
    @balance = balance
    @transaction_log = []
    @mutex = Mutex.new
  end

  def deposit(amount)
    @mutex.synchronize do
      @balance += amount
      @transaction_log << { type: :deposit, amount: amount }
      verify_invariants
    end
  end

  def withdraw(amount)
    @mutex.synchronize do
      @balance -= amount
      @transaction_log << { type: :withdraw, amount: amount }
      verify_invariants
    end
  end

  private

  def verify_invariants
    computed_balance = @transaction_log.sum do |tx|
      tx[:type] == :deposit ? tx[:amount] : -tx[:amount]
    end
    
    raise "Balance invariant violated" unless computed_balance == @balance
  end
end

Invariant checks catch race conditions immediately rather than allowing corruption to propagate. In production, log violations instead of raising exceptions.

Stress Testing increases the probability of triggering race conditions by maximizing thread contention:

def stress_test(iterations: 10_000, threads: 100)
  errors = []
  mutex = Mutex.new

  thread_pool = threads.times.map do
    Thread.new do
      iterations.times do
        begin
          yield
        rescue => e
          mutex.synchronize { errors << e }
        end
      end
    end
  end

  thread_pool.each(&:join)
  errors
end

# Usage
errors = stress_test do
  account.withdraw(1)
end

if errors.any?
  puts "Race condition detected: #{errors.size} failures"
end

Stress testing amplifies timing windows, making intermittent race conditions more likely to manifest. Run tests repeatedly with varying thread counts and delays.

Deterministic Replay for race conditions uses thread scheduling control:

# This requires external tools or modified thread schedulers
# Conceptual approach:

class DeterministicScheduler
  def initialize
    @schedule = []
    @current_step = 0
  end

  def record_schedule
    Thread.current[:step] = @schedule.size
    @schedule << Thread.current.object_id
  end

  def replay_schedule
    expected_thread = @schedule[@current_step]
    sleep 0.001 until Thread.current.object_id == expected_thread
    @current_step += 1
  end
end

In practice, deterministic replay requires specialized tools. The concurrent-ruby gem provides some facilities, or use external tools like rr or custom thread instrumentation.

Timeout Protection prevents indefinite hangs from deadlocks:

require 'timeout'

def safe_synchronized_operation
  Timeout.timeout(5) do
    @mutex.synchronize do
      # Critical section
    end
  end
rescue Timeout::Error
  logger.error "Operation timed out - possible deadlock"
  raise
end

Timeouts detect but do not solve deadlocks. Log timeout events to identify patterns suggesting lock contention or deadlock scenarios.

Diagnostic Logging captures thread interleaving without changing timing significantly:

class ThreadSafeLogger
  def initialize
    @mutex = Mutex.new
  end

  def log(message)
    @mutex.synchronize do
      timestamp = Time.now.strftime("%Y-%m-%d %H:%M:%S.%6N")
      thread_id = Thread.current.object_id
      puts "[#{timestamp}] Thread #{thread_id}: #{message}"
    end
  end
end

logger = ThreadSafeLogger.new

def process_with_logging(item)
  logger.log "Starting processing: #{item}"
  # Process item
  logger.log "Finished processing: #{item}"
end

Thread-safe logging reveals execution order across threads. Buffer logs in memory during test runs to minimize I/O impact on timing.

Graceful Degradation handles detected race conditions without crashing:

class RobustCache
  def initialize
    @cache = {}
    @mutex = Mutex.new
    @computation_locks = {}
  end

  def fetch(key)
    return @cache[key] if @cache.key?(key)

    computation_lock = @mutex.synchronize do
      @computation_locks[key] ||= Mutex.new
    end

    computation_lock.synchronize do
      return @cache[key] if @cache.key?(key)

      begin
        value = yield
        @cache[key] = value
      rescue => e
        logger.error "Cache computation failed: #{e.message}"
        nil  # Return nil rather than propagating error
      end
    end
  end
end

Per-key locks prevent multiple threads from computing the same value while allowing concurrent computation of different keys.

Security Implications

Race conditions create security vulnerabilities beyond data corruption. Attackers exploit timing windows to bypass security checks, escalate privileges, or access unauthorized resources.

Time-of-Check Time-of-Use (TOCTOU) Attacks exploit the gap between security checks and resource access:

# Vulnerable code
def process_file(user_id, filename)
  path = "/uploads/#{user_id}/#{filename}"
  
  if File.exist?(path) && File.owned?(path) && !File.symlink?(path)
    # Security check passes
    sleep 0.1  # Simulated processing delay
    contents = File.read(path)  # File may have changed
    process(contents)
  end
end

An attacker replaces the file with a symlink to /etc/passwd between the check and read. The application reads sensitive data despite security checks. Solutions include:

def process_file_safely(user_id, filename)
  path = "/uploads/#{user_id}/#{filename}"
  
  File.open(path, File::RDONLY | File::NOFOLLOW) do |file|
    stat = file.stat
    
    if stat.owned? && !stat.symlink?
      contents = file.read
      process(contents)
    end
  end
rescue Errno::ELOOP
  # Symlink detected
  raise SecurityError, "Symlink not allowed"
end

Opening the file with NOFOLLOW prevents symlink following. Checking ownership on the open file descriptor eliminates the TOCTOU window.

Session Race Conditions in authentication systems allow session hijacking:

# Vulnerable session check
class SessionManager
  def validate_and_use_session(session_id)
    session = @sessions[session_id]
    
    if session && session[:expires_at] > Time.now
      # Valid session
      session[:expires_at] = Time.now + 3600
      
      return session[:user_id]
    end
    
    nil
  end
end

Multiple requests with the same session ID may validate simultaneously before one invalidates it. An attacker steals a session ID, races to use it before the victim's request invalidates it. The solution uses atomic operations:

class SecureSessionManager
  def initialize
    @sessions = {}
    @mutex = Mutex.new
  end

  def validate_and_use_session(session_id)
    @mutex.synchronize do
      session = @sessions[session_id]
      
      return nil unless session
      return nil if session[:expires_at] <= Time.now
      
      # Atomically extend expiration within lock
      session[:expires_at] = Time.now + 3600
      session[:user_id]
    end
  end
end

Privilege Escalation Through Race Conditions occurs when authorization checks race with state changes:

class DocumentAccess
  def grant_temporary_access(user_id, document_id, duration)
    @temporary_access[user_id] ||= []
    @temporary_access[user_id] << {
      document: document_id,
      expires: Time.now + duration
    }
  end

  def can_access?(user_id, document_id)
    access_grants = @temporary_access[user_id] || []
    
    access_grants.any? do |grant|
      grant[:document] == document_id && grant[:expires] > Time.now
    end
  end
end

# Vulnerable usage
if access_manager.can_access?(user_id, doc_id)
  # Time passes - access may expire
  document = Document.find(doc_id)  # Race window
  document.delete  # User performs privileged operation
end

Between the authorization check and the privileged operation, access may expire or be revoked. An attacker rapidly requests operations hoping to win the race. The solution combines check and action atomically:

class SecureDocumentAccess
  def perform_with_authorization(user_id, document_id)
    @mutex.synchronize do
      unless can_access_locked?(user_id, document_id)
        raise UnauthorizedError
      end
      
      yield  # Execute operation within lock
    end
  end

  private

  def can_access_locked?(user_id, document_id)
    # Authorization check within lock
  end
end

# Usage
access_manager.perform_with_authorization(user_id, doc_id) do
  Document.find(doc_id).delete
end

Double-Spend Attacks in financial systems exploit race conditions:

class Wallet
  def spend(user_id, amount)
    balance = get_balance(user_id)
    
    if balance >= amount
      # Race window: multiple spends may pass this check
      new_balance = balance - amount
      set_balance(user_id, new_balance)
      true
    else
      false
    end
  end
end

An attacker submits multiple simultaneous spend requests. Each reads the same balance, passes the check, and subtracts, spending more than the balance allows. The solution uses database transactions with row-level locking or compare-and-swap operations.

Testing Approaches

Testing race conditions requires techniques beyond standard unit tests. Race conditions may not manifest in sequential test execution but appear only under specific concurrent scenarios.

Property-Based Testing verifies invariants hold across random thread interleavings:

require 'concurrent'

def test_counter_invariant
  counter = ThreadSafeCounter.new(0)
  
  100.times do
    expected_final = rand(1000..5000)
    increments_per_thread = expected_final / 10
    
    threads = 10.times.map do
      Thread.new do
        increments_per_thread.times { counter.increment }
      end
    end
    
    threads.each(&:join)
    
    assert_equal expected_final, counter.value,
      "Counter invariant violated: expected #{expected_final}, got #{counter.value}"
      
    counter.reset
  end
end

Run property tests many times with varying parameters to explore different execution paths. The random variation increases the probability of triggering race conditions.

Interleaving Injection manually forces specific thread execution orders:

class InterleavingTest
  def setup
    @step_mutex = Mutex.new
    @step_cv = ConditionVariable.new
    @current_step = 0
  end

  def wait_for_step(expected_step)
    @step_mutex.synchronize do
      @step_cv.wait(@step_mutex) until @current_step == expected_step
    end
  end

  def advance_step
    @step_mutex.synchronize do
      @current_step += 1
      @step_cv.broadcast
    end
  end

  def test_specific_interleaving
    shared_value = 0

    thread1 = Thread.new do
      wait_for_step(0)
      temp = shared_value
      advance_step
      
      wait_for_step(2)
      shared_value = temp + 1
      advance_step
    end

    thread2 = Thread.new do
      wait_for_step(1)
      temp = shared_value
      advance_step
      
      wait_for_step(3)
      shared_value = temp + 1
      advance_step
    end

    advance_step  # Start step 0
    [thread1, thread2].each(&:join)

    assert_equal 1, shared_value, "Race condition: lost update detected"
  end
end

This technique tests specific interleavings that trigger known race conditions, confirming fixes prevent them.

Fuzzing Concurrent Operations randomly varies operation order and timing:

def concurrent_fuzz_test(operations, iterations: 1000, threads: 10)
  iterations.times do
    thread_pool = threads.times.map do
      Thread.new do
        operation = operations.sample
        sleep(rand * 0.001)  # Random small delay
        operation.call
      end
    end
    
    thread_pool.each(&:join)
    
    yield if block_given?  # Verify invariants after each iteration
  end
end

# Usage
account = BankAccount.new(1000)

operations = [
  -> { account.deposit(10) },
  -> { account.withdraw(10) },
  -> { account.balance }
]

concurrent_fuzz_test(operations) do
  assert account.balance >= 0, "Negative balance detected"
end

Fuzzing explores a wide range of concurrent scenarios, increasing coverage of possible race conditions.

Happens-Before Verification checks that operations occur in required order:

class HappensBeforeTracker
  def initialize
    @events = []
    @mutex = Mutex.new
  end

  def record(event)
    @mutex.synchronize do
      @events << { event: event, thread: Thread.current.object_id, time: Time.now }
    end
  end

  def verify_order(before_event, after_event)
    before_time = @events.find { |e| e[:event] == before_event }&.fetch(:time)
    after_time = @events.find { |e| e[:event] == after_event }&.fetch(:time)
    
    raise "Event #{before_event} must happen before #{after_event}" if 
      before_time && after_time && before_time > after_time
  end
end

# Usage in test
tracker = HappensBeforeTracker.new

thread1 = Thread.new do
  initialize_resource
  tracker.record(:initialized)
end

thread2 = Thread.new do
  sleep 0.01  # Ensure thread1 starts first
  use_resource
  tracker.record(:used)
end

[thread1, thread2].each(&:join)
tracker.verify_order(:initialized, :used)

Static Analysis Integration detects potential race conditions without execution:

# While Ruby lacks comprehensive static analysis for race conditions,
# patterns can be checked using custom linters or code review

class RaceConditionLinter
  def check_unprotected_access(ast)
    shared_vars = find_shared_variables(ast)
    
    shared_vars.each do |var|
      accesses = find_variable_accesses(ast, var)
      
      accesses.each do |access|
        unless protected_by_mutex?(access)
          warn "Potential race condition: unprotected access to #{var} at line #{access.line}"
        end
      end
    end
  end
end

Static analysis complements dynamic testing by identifying suspicious patterns during code review.

Reference

Thread Safety Guarantees

Operation Thread Safe Notes
Variable read No May see stale values
Variable write No May interleave with other writes
Integer arithmetic No Not atomic despite GIL
Array append No Internal array operations not atomic
Hash access No May corrupt during concurrent writes
String concatenation No String mutation not thread-safe
Queue push/pop Yes Synchronized internally
Mutex lock/unlock Yes Provides mutual exclusion
Atomic operations Yes Hardware-level atomicity

Ruby Concurrency Primitives

Primitive Purpose Key Methods
Thread Concurrent execution new, join, kill, value
Mutex Mutual exclusion lock, unlock, synchronize
ConditionVariable Thread coordination wait, signal, broadcast
Queue Thread-safe queue push, pop, empty?
SizedQueue Bounded queue push, pop, max, num_waiting
Monitor Object-level locking synchronize, wait, signal

Common Race Condition Patterns

Pattern Description Prevention
Check-then-act Validate then modify based on stale data Atomic check-and-modify operation
Read-modify-write Read value, compute, write result Mutex around entire operation
Lazy initialization Multiple threads initialize same resource Double-checked locking or eager init
Lost update Concurrent writes overwrite each other Optimistic locking or mutex
Dirty read Read partially updated data Consistent locking strategy

Detection Tools and Techniques

Technique Effectiveness Overhead
Manual code review Medium Low
Stress testing High Medium
Property-based testing High Medium
Static analysis Medium Low
Runtime detection tools High High
Invariant checking High Medium

Synchronization Strategy Selection

Scenario Recommended Approach Alternative
Simple counter AtomicFixnum Mutex-protected variable
Producer-consumer Queue or SizedQueue ConditionVariable with array
Complex state machine Mutex with state variable Actor model library
Read-heavy data Read-write lock Immutable data with copy-on-write
Initialization once Mutex with double-check Eager initialization
Thread coordination ConditionVariable Sleep polling with backoff

Thread-Safe Ruby Gems

Gem Purpose Key Features
concurrent-ruby Concurrency utilities Atomic types, thread pools, futures
thread_safe Thread-safe collections Cache, array, hash implementations
celluloid Actor-based concurrency Supervision, async calls, timers
concurrent-ruby-ext Native extensions Performance-optimized atomic ops
monitor Standard library monitor Object-level synchronization