CrackedRuby - Concurrency vs Parallelism

Overview

Concurrency and parallelism represent different approaches to handling multiple tasks in software systems. Concurrency describes a program's structure where multiple tasks make progress by interleaving their execution, while parallelism describes actual simultaneous execution of multiple tasks on multiple processors or cores.

The confusion between these concepts stems from their similar outcomes in improving program responsiveness and throughput. A concurrent program manages multiple tasks that may run on a single processor through context switching, giving the appearance of simultaneous execution. A parallel program executes multiple tasks truly simultaneously on multiple processors or cores.

Rob Pike's formulation captures the distinction: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once." This difference matters because concurrent programs focus on program structure and task coordination, while parallel programs focus on performance through simultaneous execution.

Consider a web server handling multiple client requests. A concurrent design manages multiple requests through an event loop or thread pool, switching between requests as they wait for I/O operations. Each request appears to progress simultaneously, but only one executes at any instant on a single core. A parallel design processes multiple requests simultaneously on different cores, achieving true simultaneous execution.

The distinction affects system design, performance characteristics, and debugging approaches. Concurrent systems require synchronization mechanisms to coordinate task switching and shared resource access. Parallel systems need these same mechanisms but also contend with the complexity of true simultaneous execution, including cache coherency, memory visibility, and processor coordination.

Key Principles

Concurrency focuses on program structure. A concurrent program decomposes work into tasks that can progress independently, whether they execute simultaneously or not. The tasks coordinate through communication and synchronization, managing shared resources and ordering constraints. The operating system or runtime schedules these tasks, interleaving their execution on available processors.

Parallelism focuses on simultaneous execution. A parallel program executes multiple operations at the same instant, requiring multiple execution units. The program distributes work across processors, cores, or machines to complete operations faster than sequential execution would allow. Parallelism directly trades additional hardware resources for reduced execution time.

Concurrency enables parallelism but does not guarantee it. A well-structured concurrent program can execute in parallel when sufficient processors exist, but concurrent execution can occur on a single processor through interleaving. A program needs concurrent structure before it can achieve parallel execution, but concurrency alone does not provide parallel performance benefits.

Task independence determines parallelization potential. Tasks with minimal interdependencies parallelize more effectively because they require less synchronization overhead. Heavy coordination between tasks limits parallel speedup through synchronization costs and reduced processor utilization. The ratio of independent computation to synchronization determines the maximum parallel efficiency achievable.

Amdahl's Law constrains parallel performance. The sequential portions of a program limit the maximum speedup achievable through parallelization. If S represents the sequential portion (as a fraction) and P represents the parallelizable portion, then maximum speedup with N processors equals 1 / (S + P/N). A program with 10% sequential work cannot exceed 10x speedup regardless of available processors.

Memory models affect correctness. Concurrent and parallel programs must account for how different processors observe memory updates. Without proper synchronization, one processor may not see updates made by another processor due to caching and compiler optimizations. Memory barriers and synchronization primitives establish ordering guarantees that ensure correct program behavior.

Scheduling strategies determine concurrency behavior. Cooperative scheduling requires tasks to explicitly yield control, providing deterministic execution order but risking starvation if tasks fail to yield. Preemptive scheduling interrupts tasks periodically, preventing starvation but introducing timing-dependent behavior. The scheduler's decisions affect system responsiveness, fairness, and throughput.

Resource contention creates bottlenecks. Multiple tasks competing for shared resources introduce serialization points that limit parallel performance. Lock contention, memory bandwidth saturation, and I/O capacity constraints can reduce parallel efficiency below theoretical maximum. Identifying and minimizing contention points determines actual parallel performance.

Ruby Implementation

Ruby's approach to concurrency and parallelism evolved significantly across its implementation. MRI (Matz's Ruby Interpreter), the standard Ruby implementation, uses a Global Interpreter Lock (GIL) that prevents true parallel execution of Ruby code threads. The GIL ensures thread safety of Ruby's internals but serializes Ruby code execution even on multi-core systems.

The Thread class provides Ruby's primary concurrency abstraction. Threads created with Thread.new share the same memory space and can access shared objects, but the GIL ensures only one thread executes Ruby code at a time. This design makes threads effective for I/O-bound operations where threads can release the GIL during I/O waits, but ineffective for CPU-bound parallelism.

# Concurrent execution with threads
threads = 5.times.map do |i|
  Thread.new do
    puts "Thread #{i} starting"
    sleep(1)  # Releases GIL during I/O
    puts "Thread #{i} finishing"
  end
end

threads.each(&:join)
# Threads interleave execution but GIL prevents parallel CPU work

Process forking provides true parallelism by creating separate Ruby interpreter instances. Each process has its own GIL and memory space, enabling parallel CPU work at the cost of higher memory overhead and more complex inter-process communication. The Process module handles fork operations.

# Parallel execution with processes
pids = 5.times.map do |i|
  fork do
    result = expensive_calculation(i)
    puts "Process #{i}: #{result}"
  end
end

pids.each { |pid| Process.wait(pid) }
# Each process runs independently with true parallelism

Ractors, introduced in Ruby 3.0, provide an actor-based parallelism model that allows true parallel execution of Ruby code. Each Ractor has its own GIL and cannot share most objects with other Ractors, communicating instead through message passing. This isolation enables parallel execution while maintaining thread safety.

# Parallel execution with Ractors
ractors = 5.times.map do |i|
  Ractor.new(i) do |num|
    result = expensive_calculation(num)
    Ractor.yield(result)
  end
end

results = ractors.map { |r| r.take }
# Ractors execute in true parallelism without GIL contention

The Mutex class provides mutual exclusion for shared resource access. Threads competing for a mutex will block until the lock becomes available, serializing access to protected code sections. This mechanism prevents race conditions but introduces contention that limits parallel performance.

# Thread-safe counter with Mutex
counter = 0
mutex = Mutex.new

threads = 10.times.map do
  Thread.new do
    1000.times do
      mutex.synchronize do
        counter += 1
      end
    end
  end
end

threads.each(&:join)
puts counter  # => 10000 (safe increment)

Queue and SizedQueue provide thread-safe, concurrent data structures for producer-consumer patterns. Multiple threads can safely push and pop items without explicit locking. SizedQueue adds blocking behavior when the queue reaches capacity, providing backpressure mechanisms.

# Producer-consumer with Queue
require 'thread'

queue = Queue.new

# Producer threads
producers = 3.times.map do |i|
  Thread.new do
    5.times do |j|
      queue.push("Item #{i}-#{j}")
      sleep(0.1)
    end
  end
end

# Consumer threads
consumers = 2.times.map do
  Thread.new do
    loop do
      item = queue.pop
      puts "Processing: #{item}"
      sleep(0.2)
    end
  end
end

producers.each(&:join)
sleep(1)
consumers.each(&:kill)

The concurrent-ruby gem extends Ruby's concurrency primitives with production-grade abstractions. Concurrent::Future provides promise-style asynchronous execution, Concurrent::ThreadPoolExecutor manages worker thread pools, and Concurrent::Map offers lock-free concurrent hash operations.

require 'concurrent-ruby'

# Thread pool for concurrent execution
pool = Concurrent::FixedThreadPool.new(5)

futures = 10.times.map do |i|
  Concurrent::Future.execute(executor: pool) do
    expensive_operation(i)
  end
end

results = futures.map(&:value)  # Blocks until all complete
pool.shutdown
pool.wait_for_termination

Fiber provides cooperative concurrency where the programmer explicitly controls task switching through Fiber.yield. Unlike threads, fibers don't run in parallel and don't require locking for shared state, but require explicit yielding to prevent starvation.

# Cooperative concurrency with Fiber
fiber1 = Fiber.new do
  puts "Fiber 1: Start"
  Fiber.yield
  puts "Fiber 1: Resume"
  Fiber.yield
  puts "Fiber 1: End"
end

fiber2 = Fiber.new do
  puts "Fiber 2: Start"
  Fiber.yield
  puts "Fiber 2: End"
end

fiber1.resume  # Fiber 1: Start
fiber2.resume  # Fiber 2: Start
fiber1.resume  # Fiber 1: Resume
fiber2.resume  # Fiber 2: End
fiber1.resume  # Fiber 1: End

Practical Examples

Web Server Request Handling

A web server demonstrates concurrency through handling multiple client connections simultaneously. Each request progresses independently, with the server switching between requests as they wait for database queries, external API calls, or file I/O.

require 'socket'
require 'thread'

server = TCPServer.new(3000)
puts "Server listening on port 3000"

# Concurrent request handling with threads
loop do
  Thread.new(server.accept) do |client|
    request = client.gets
    puts "Handling request: #{request}"
    
    # Simulate I/O-bound work (database query, API call)
    sleep(2)
    
    response = "HTTP/1.1 200 OK\r\n\r\nRequest processed\n"
    client.puts(response)
    client.close
  end
end
# Multiple requests progress concurrently
# I/O operations release GIL enabling effective concurrency

Image Processing Pipeline

Image processing demonstrates true parallelism where independent image transformations execute simultaneously on different CPU cores. Process-based parallelism avoids GIL limitations for CPU-intensive operations.

require 'mini_magick'

image_files = Dir.glob("images/*.jpg")
chunk_size = (image_files.length / 4.0).ceil
chunks = image_files.each_slice(chunk_size).to_a

# Parallel processing with forked processes
pids = chunks.map.with_index do |chunk, i|
  fork do
    chunk.each do |file|
      image = MiniMagick::Image.open(file)
      image.resize "800x600"
      image.write("processed/#{File.basename(file)}")
    end
    puts "Worker #{i} completed #{chunk.length} images"
  end
end

pids.each { |pid| Process.wait(pid) }
puts "All images processed"
# Each process runs in true parallelism on separate cores
# CPU-bound work benefits from parallel execution

Background Job Processing

Background job systems combine concurrency and parallelism. Workers process jobs concurrently within each process, while multiple worker processes achieve parallelism across cores.

require 'concurrent-ruby'

class JobWorker
  def initialize(queue, num_threads)
    @queue = queue
    @pool = Concurrent::FixedThreadPool.new(num_threads)
    @running = true
  end
  
  def start
    @pool.post { process_jobs } while @running
  end
  
  def stop
    @running = false
    @pool.shutdown
    @pool.wait_for_termination
  end
  
  private
  
  def process_jobs
    while @running
      job = @queue.pop(true) rescue nil
      next unless job
      
      case job[:type]
      when :email
        send_email(job[:data])
      when :report
        generate_report(job[:data])
      when :cleanup
        cleanup_resources(job[:data])
      end
    end
  end
end

# Run multiple worker processes, each with concurrent threads
4.times do
  fork do
    queue = Queue.new  # In production, use Redis or database
    worker = JobWorker.new(queue, 5)
    worker.start
  end
end

Data Aggregation with Ractors

Data aggregation across large datasets benefits from Ractor-based parallelism. Each Ractor processes a subset of data independently, then results merge through message passing.

# Parallel data processing with Ractors
data_chunks = large_dataset.each_slice(1000).to_a

ractors = data_chunks.map do |chunk|
  Ractor.new(chunk) do |data|
    # Each Ractor processes its chunk in parallel
    result = data.reduce(Hash.new(0)) do |acc, item|
      acc[item[:category]] += item[:value]
      acc
    end
    result  # Return via message passing
  end
end

# Collect and merge results from all Ractors
final_result = ractors.reduce(Hash.new(0)) do |acc, ractor|
  partial = ractor.take
  partial.each { |k, v| acc[k] += v }
  acc
end

puts "Aggregated results: #{final_result}"
# Each Ractor achieves true parallelism on separate cores

Concurrent API Client

API clients demonstrate effective concurrent I/O operations. Multiple HTTP requests execute concurrently, with threads waiting during network I/O while others progress.

require 'net/http'
require 'json'

class ConcurrentAPIClient
  def initialize(base_url, max_threads: 10)
    @base_url = base_url
    @pool = Concurrent::FixedThreadPool.new(max_threads)
  end
  
  def fetch_multiple(endpoints)
    futures = endpoints.map do |endpoint|
      Concurrent::Future.execute(executor: @pool) do
        fetch_endpoint(endpoint)
      end
    end
    
    futures.map(&:value)
  end
  
  private
  
  def fetch_endpoint(endpoint)
    uri = URI("#{@base_url}#{endpoint}")
    response = Net::HTTP.get_response(uri)
    JSON.parse(response.body) if response.is_a?(Net::HTTPSuccess)
  end
end

client = ConcurrentAPIClient.new("https://api.example.com")
endpoints = ["/users", "/posts", "/comments", "/tags"]
results = client.fetch_multiple(endpoints)
# Requests execute concurrently, overlapping network I/O waits

Parallel Computation with MapReduce

MapReduce patterns demonstrate parallelism in data processing. The map phase distributes work across processors, the reduce phase combines results, and both phases can execute in parallel when data partitions are independent.

class ParallelMapReduce
  def self.map_reduce(data, map_fn, reduce_fn)
    # Distribute data across Ractors for parallel map
    chunk_size = (data.length / Concurrent.processor_count.to_f).ceil
    chunks = data.each_slice(chunk_size).to_a
    
    map_ractors = chunks.map do |chunk|
      Ractor.new(chunk, map_fn) do |data, mapper|
        data.map { |item| mapper.call(item) }
      end
    end
    
    # Collect mapped results
    mapped = map_ractors.flat_map(&:take)
    
    # Group for reduce phase
    grouped = mapped.group_by(&:first)
    
    # Parallel reduce across groups
    reduce_ractors = grouped.map do |key, values|
      Ractor.new(key, values, reduce_fn) do |k, vals, reducer|
        [k, reducer.call(vals.map(&:last))]
      end
    end
    
    reduce_ractors.map(&:take).to_h
  end
end

# Word count example
mapper = ->(line) { line.split.map { |word| [word, 1] } }.flatten(1)
reducer = ->(counts) { counts.sum }

result = ParallelMapReduce.map_reduce(
  text_lines,
  mapper,
  reducer
)
# Both map and reduce execute in true parallelism

Design Considerations

I/O-bound vs CPU-bound workload characteristics determine the appropriate concurrency model. I/O-bound operations spend most execution time waiting for external resources like network responses, disk reads, or database queries. These workloads benefit from concurrent threading in MRI because threads can release the GIL during I/O waits, allowing other threads to execute. CPU-bound operations perform intensive calculations and cannot benefit from threading in MRI due to the GIL, requiring process-based parallelism or Ractors instead.

# I/O-bound: threads work well
def fetch_multiple_apis
  threads = urls.map do |url|
    Thread.new { HTTP.get(url) }  # GIL released during network I/O
  end
  threads.map(&:value)
end

# CPU-bound: processes or Ractors required
def parallel_calculations
  pids = datasets.map do |data|
    fork { calculate_intensive(data) }  # True parallelism needed
  end
  pids.each { |pid| Process.wait(pid) }
end

Memory constraints influence process vs thread decisions. Each forked process duplicates the parent's memory space, consuming significant RAM when spawning many workers. Ten processes each using 500MB requires 5GB total. Threads share memory space, making them memory-efficient for high-concurrency scenarios. Applications handling thousands of concurrent connections typically use threads or async I/O rather than one process per connection.

Coordination overhead affects parallel efficiency. Tasks requiring frequent synchronization or communication experience overhead that reduces parallel speedup. Lock contention forces threads to wait sequentially, converting parallel work into serial execution. Message passing between processes or Ractors incurs marshaling and communication costs. Problems with minimal interdependencies parallelize more effectively than tightly coupled tasks.

Fault isolation requirements guide architecture choices. Threads share memory space, meaning one thread's memory corruption or exception can affect others. Processes provide complete isolation, containing failures within individual workers. Critical systems often use multiple worker processes with health checks and automatic restart, accepting higher memory costs for improved reliability.

Existing infrastructure shapes implementation strategy. Applications already using thread pools or async I/O frameworks like EventMachine or Async should extend existing patterns rather than introducing conflicting models. Rails applications typically use threaded application servers like Puma, making thread-based concurrency natural. Background job systems like Sidekiq use multi-process workers with threaded execution per worker.

Scaling characteristics differ between models. Thread-based concurrency scales within a single process but cannot utilize multiple cores effectively for CPU work in MRI. Process-based parallelism scales across cores and machines but requires explicit work distribution and result collection. Hybrid approaches using multiple processes with thread pools per process provide both within-process concurrency and cross-core parallelism.

Testing and debugging complexity varies significantly. Concurrent programs introduce timing-dependent behaviors that make bugs intermittent and difficult to reproduce. Parallel programs compound this with true simultaneous execution, race conditions, and deadlock potential. Sequential code paths execute predictably, while concurrent paths require reasoning about all possible interleavings and parallel execution introduces actual simultaneous state changes.

Library and gem compatibility affects options. Not all Ruby gems are thread-safe, with some using global state or mutable class variables without proper synchronization. Database connection pools must support concurrent access. Native extensions may have their own threading models that conflict with Ruby's. Ractors restrict object sharing, preventing use of most existing gems without modification.

Performance Considerations

The Global Interpreter Lock fundamentally limits thread-based parallelism in MRI. Ruby threads cannot execute Ruby code simultaneously regardless of available CPU cores. Only one thread holds the GIL at any time, forcing serial execution of Ruby code. This design makes threading ineffective for CPU-bound parallelism but acceptable for I/O-bound concurrency where threads release the GIL during blocking operations.

# CPU-bound work shows no speedup with threads
require 'benchmark'

def fibonacci(n)
  return n if n <= 1
  fibonacci(n-1) + fibonacci(n-2)
end

# Single-threaded
single_time = Benchmark.realtime do
  4.times { fibonacci(35) }
end

# Multi-threaded
multi_time = Benchmark.realtime do
  threads = 4.times.map do
    Thread.new { fibonacci(35) }
  end
  threads.each(&:join)
end

puts "Single: #{single_time}s, Multi: #{multi_time}s"
# Multi-threaded time approximately equals single-threaded
# No parallel speedup due to GIL

Process overhead creates minimum problem size thresholds. Forking processes incurs startup cost and memory duplication overhead that dominates execution time for small tasks. Parallel processing only improves performance when task execution time significantly exceeds process creation and coordination costs. Memory-mapped files and copy-on-write optimization reduce but don't eliminate this overhead.

# Small tasks: process overhead dominates
Benchmark.realtime do
  100.times { fork { 1 + 1 } }  # Expensive due to fork overhead
end

# Large tasks: parallelism beneficial
Benchmark.realtime do
  4.times { fork { expensive_calculation } }  # Fork cost amortized
end

Lock contention creates serial bottlenecks in parallel code. Multiple threads or processes competing for shared resources through locks serialize execution at contention points. High contention reduces effective parallelism to near-sequential performance. Contention increases with core count and longer critical sections, paradoxically making parallel code slower as more processors become available.

# High contention limits parallel benefit
mutex = Mutex.new
counter = 0

threads = 8.times.map do
  Thread.new do
    10000.times do
      mutex.synchronize { counter += 1 }  # Frequent lock contention
    end
  end
end
# Threads spend most time waiting for mutex
# Performance barely improves over single-threaded

Cache coherency overhead increases with core count. Processors maintain local caches of memory that must stay synchronized across cores. When one core modifies cached data, other cores must invalidate or update their copies. This coherency traffic creates overhead that grows with core count, particularly for false sharing where unrelated data shares cache lines.

Amdahl's Law quantifies parallel speedup limits. The sequential portion of code fundamentally limits maximum speedup regardless of available processors. Code with 20% sequential work cannot exceed 5x speedup even with infinite processors. Measuring sequential portions through profiling reveals parallelization potential before investing implementation effort.

Context switching overhead affects high-concurrency scenarios. Each thread or fiber requires stack memory and incurs context switch costs when the scheduler switches between them. Creating thousands of threads consumes significant memory and CPU cycles in switching overhead. Event-driven or fiber-based approaches reduce this overhead by keeping fewer kernel threads active.

Memory bandwidth saturation limits parallel scaling. Multiple processors accessing memory simultaneously can saturate memory bus bandwidth, particularly for memory-intensive workloads. Processors stall waiting for memory access, reducing actual CPU utilization. Cache-efficient algorithms that maximize data locality achieve better parallel scaling by reducing memory traffic.

I/O capacity creates throughput ceilings. Concurrent I/O operations share limited I/O bandwidth and operation queues. Disk throughput, network bandwidth, and database connection pools all impose upper bounds on concurrent request handling. Parallel execution cannot exceed these physical limits, making additional concurrency ineffective beyond capacity thresholds.

Common Pitfalls

Assuming threaded code achieves parallelism in MRI leads to performance surprises. Developers familiar with other languages expect threads to utilize multiple cores, but MRI's GIL prevents this. CPU-intensive multithreaded Ruby code often performs worse than sequential code due to thread management overhead without parallel execution benefits.

# Pitfall: expecting parallel speedup from threads
threads = 4.times.map do
  Thread.new do
    # CPU-intensive work gets no speedup
    1_000_000.times { Math.sqrt(rand) }
  end
end
threads.each(&:join)
# Runs no faster than sequential version due to GIL

Race conditions create intermittent, environment-dependent failures. Multiple threads or processes accessing shared state without synchronization produce timing-dependent outcomes. Bugs manifest inconsistently based on execution timing, making reproduction difficult. Tests may pass locally but fail in production under different load conditions.

# Race condition on shared state
@counter = 0

threads = 10.times.map do
  Thread.new do
    1000.times do
      temp = @counter    # Read
      temp += 1          # Modify
      @counter = temp    # Write
    end
  end
end
threads.each(&:join)

puts @counter  # Not 10000 - lost updates due to race

Deadlocks occur when threads wait circularly for resources. Thread A holds Lock 1 and waits for Lock 2 while Thread B holds Lock 2 and waits for Lock 1. Both threads block indefinitely. Consistent lock ordering prevents deadlocks but requires discipline across the codebase.

# Deadlock potential with inconsistent lock ordering
mutex_a = Mutex.new
mutex_b = Mutex.new

thread1 = Thread.new do
  mutex_a.synchronize do
    sleep(0.1)  # Increase deadlock likelihood
    mutex_b.synchronize { puts "Thread 1" }
  end
end

thread2 = Thread.new do
  mutex_b.synchronize do
    sleep(0.1)
    mutex_a.synchronize { puts "Thread 2" }
  end
end

# Both threads may deadlock waiting for each other

Shared mutable state without synchronization causes corruption. Objects modified by multiple threads simultaneously experience torn reads, partial updates, and inconsistent state. Even simple operations like incrementing a counter are non-atomic, requiring synchronization for correctness.

# Data corruption without synchronization
class UnsafeCache
  def initialize
    @cache = {}
  end
  
  def get(key)
    @cache[key]  # Concurrent reads may see partial state
  end
  
  def set(key, value)
    @cache[key] = value  # Concurrent writes may corrupt hash
  end
end

# Multiple threads corrupting shared hash
cache = UnsafeCache.new
threads = 100.times.map do |i|
  Thread.new { cache.set(i, "value#{i}") }
end
threads.each(&:join)
# Cache may be corrupted with missing or incorrect entries

Memory visibility issues cause one thread to miss updates from another. Modern processors cache memory locally and reorder operations for performance. Without synchronization primitives, updates made by one thread may not become visible to others. Volatile reads and writes through mutexes or atomic operations establish visibility guarantees.

# Visibility issue with unsynchronized flag
@stop_flag = false

worker = Thread.new do
  until @stop_flag  # May never see update
    do_work
  end
end

sleep(5)
@stop_flag = true  # Update may not be visible to worker thread
worker.join
# Worker may continue running indefinitely

Exception handling in threads requires explicit attention. Uncaught exceptions in threads silently terminate that thread without propagating to the parent. The main thread continues unaware of worker thread failures unless explicitly checking thread status.

# Silent thread failure
thread = Thread.new do
  raise "Oops"  # Exception terminates thread silently
end

sleep(1)
puts "Main thread continues unaware of failure"

# Correct: check thread status
begin
  thread.join  # Raises exception from thread
rescue => e
  puts "Thread failed: #{e.message}"
end

Fork safety issues arise from threads and file descriptors. Forking with active threads creates undefined behavior because only the forking thread exists in the child process, leaving mutexes locked or other threads' work incomplete. Open file descriptors persist across fork, requiring cleanup in child processes to avoid unintended resource sharing.

# Unsafe fork with active threads
mutex = Mutex.new
thread = Thread.new do
  mutex.synchronize { sleep(10) }
end

fork do
  # Child process has locked mutex but no thread to release it
  mutex.synchronize { puts "Deadlock in child" }
end

Over-parallelization degrades performance through coordination overhead. Creating more threads or processes than available cores introduces excessive context switching and memory consumption without performance benefit. Task granularity must balance parallel opportunity against coordination costs.

Reference

Core Concepts Comparison

Aspect	Concurrency	Parallelism
Definition	Managing multiple tasks	Executing multiple tasks simultaneously
Focus	Program structure	Actual execution
Single Core	Possible through interleaving	Not possible
Multi-Core	Possible with or without parallel execution	Requires multiple cores
Primary Benefit	Responsiveness, resource utilization	Performance, reduced latency
Coordination	Required for task switching	Required plus simultaneous execution complexity

Ruby Concurrency Primitives

Primitive	Parallelism	Memory Model	Use Case
Thread	No (GIL)	Shared	I/O-bound concurrency
Process	Yes	Isolated	CPU-bound parallelism
Ractor	Yes	Message passing	CPU-bound with isolation
Fiber	No	Shared, cooperative	Structured concurrency, generators
Queue	N/A	Thread-safe	Producer-consumer patterns
Mutex	N/A	Synchronization	Protecting shared resources

Thread Methods

Method	Purpose	Behavior
Thread.new	Create thread	Returns thread object immediately
Thread.current	Current thread	Returns executing thread
join	Wait for completion	Blocks until thread finishes
value	Get return value	Blocks until thread finishes, returns value
alive?	Check status	Returns true if thread running
kill	Terminate thread	Immediately stops thread execution
status	Get state	Returns run, sleep, aborting, false, or nil

Process Methods

Method	Purpose	Behavior
fork	Create child process	Returns pid in parent, nil in child
wait	Wait for child	Blocks until any child exits
waitpid	Wait for specific child	Blocks until specified child exits
exit	Terminate process	Immediately exits with status code
pid	Current process ID	Returns integer process identifier
daemon	Daemonize	Detaches from controlling terminal

Ractor Operations

Operation	Purpose	Behavior
Ractor.new	Create ractor	Returns ractor object
send	Send message	Queues message for ractor
receive	Receive message	Blocks until message available
take	Get return value	Blocks until ractor finishes
shareable?	Check shareability	Returns true if object can be shared

Synchronization Primitives

Primitive	Type	Characteristics
Mutex	Mutual exclusion	Blocks threads waiting for lock
Monitor	Reentrant mutex	Same thread can acquire multiple times
ConditionVariable	Condition waiting	Wait for condition with timeout
Queue	Thread-safe queue	Blocking push/pop operations
SizedQueue	Bounded queue	Blocks when full or empty

Decision Matrix

Workload Type	Ruby Solution	Reasoning
I/O-bound, high concurrency	Threads	GIL released during I/O, memory efficient
CPU-bound, low count	Processes	True parallelism, isolated failures
CPU-bound, high count	Ractor	Parallel execution with lower overhead
Memory-constrained	Threads or Fibers	Shared memory space
Need isolation	Processes	Separate memory spaces
Event-driven	Fibers or EventMachine	Cooperative scheduling

Performance Characteristics

Approach	Startup Cost	Memory Overhead	CPU Utilization	Coordination Cost
Single Thread	None	Baseline	One core max	None
Multiple Threads	Low	Low	One core (GIL)	Mutex contention
Multiple Processes	High	High	Multiple cores	IPC overhead
Ractors	Medium	Medium	Multiple cores	Message passing
Fibers	Very low	Very low	One core	Manual yields

Common Patterns

Pattern	Implementation	Use Case
Thread Pool	Fixed threads processing queue	Web server request handling
Fork-Join	Fork workers, wait for completion	Parallel batch processing
Pipeline	Queues between stages	Data processing pipeline
Actor Model	Ractors with message passing	Concurrent stateful entities
Producer-Consumer	Queue with multiple readers/writers	Decoupling data generation from processing
Work Stealing	Threads take from shared queue	Dynamic load balancing

GIL Behavior

Operation	GIL Status	Parallel Execution
Ruby code execution	Held	No
Blocking I/O	Released	Yes
sleep	Released	Yes
Native extension	Released if designed properly	Yes
C-level operations	Held or released by implementation	Varies

Concurrency vs Parallelism