Overview
Concurrency and parallelism represent different approaches to handling multiple tasks in software systems. Concurrency describes a program's structure where multiple tasks make progress by interleaving their execution, while parallelism describes actual simultaneous execution of multiple tasks on multiple processors or cores.
The confusion between these concepts stems from their similar outcomes in improving program responsiveness and throughput. A concurrent program manages multiple tasks that may run on a single processor through context switching, giving the appearance of simultaneous execution. A parallel program executes multiple tasks truly simultaneously on multiple processors or cores.
Rob Pike's formulation captures the distinction: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once." This difference matters because concurrent programs focus on program structure and task coordination, while parallel programs focus on performance through simultaneous execution.
Consider a web server handling multiple client requests. A concurrent design manages multiple requests through an event loop or thread pool, switching between requests as they wait for I/O operations. Each request appears to progress simultaneously, but only one executes at any instant on a single core. A parallel design processes multiple requests simultaneously on different cores, achieving true simultaneous execution.
The distinction affects system design, performance characteristics, and debugging approaches. Concurrent systems require synchronization mechanisms to coordinate task switching and shared resource access. Parallel systems need these same mechanisms but also contend with the complexity of true simultaneous execution, including cache coherency, memory visibility, and processor coordination.
Key Principles
Concurrency focuses on program structure. A concurrent program decomposes work into tasks that can progress independently, whether they execute simultaneously or not. The tasks coordinate through communication and synchronization, managing shared resources and ordering constraints. The operating system or runtime schedules these tasks, interleaving their execution on available processors.
Parallelism focuses on simultaneous execution. A parallel program executes multiple operations at the same instant, requiring multiple execution units. The program distributes work across processors, cores, or machines to complete operations faster than sequential execution would allow. Parallelism directly trades additional hardware resources for reduced execution time.
Concurrency enables parallelism but does not guarantee it. A well-structured concurrent program can execute in parallel when sufficient processors exist, but concurrent execution can occur on a single processor through interleaving. A program needs concurrent structure before it can achieve parallel execution, but concurrency alone does not provide parallel performance benefits.
Task independence determines parallelization potential. Tasks with minimal interdependencies parallelize more effectively because they require less synchronization overhead. Heavy coordination between tasks limits parallel speedup through synchronization costs and reduced processor utilization. The ratio of independent computation to synchronization determines the maximum parallel efficiency achievable.
Amdahl's Law constrains parallel performance. The sequential portions of a program limit the maximum speedup achievable through parallelization. If S represents the sequential portion (as a fraction) and P represents the parallelizable portion, then maximum speedup with N processors equals 1 / (S + P/N). A program with 10% sequential work cannot exceed 10x speedup regardless of available processors.
Memory models affect correctness. Concurrent and parallel programs must account for how different processors observe memory updates. Without proper synchronization, one processor may not see updates made by another processor due to caching and compiler optimizations. Memory barriers and synchronization primitives establish ordering guarantees that ensure correct program behavior.
Scheduling strategies determine concurrency behavior. Cooperative scheduling requires tasks to explicitly yield control, providing deterministic execution order but risking starvation if tasks fail to yield. Preemptive scheduling interrupts tasks periodically, preventing starvation but introducing timing-dependent behavior. The scheduler's decisions affect system responsiveness, fairness, and throughput.
Resource contention creates bottlenecks. Multiple tasks competing for shared resources introduce serialization points that limit parallel performance. Lock contention, memory bandwidth saturation, and I/O capacity constraints can reduce parallel efficiency below theoretical maximum. Identifying and minimizing contention points determines actual parallel performance.
Ruby Implementation
Ruby's approach to concurrency and parallelism evolved significantly across its implementation. MRI (Matz's Ruby Interpreter), the standard Ruby implementation, uses a Global Interpreter Lock (GIL) that prevents true parallel execution of Ruby code threads. The GIL ensures thread safety of Ruby's internals but serializes Ruby code execution even on multi-core systems.
The Thread class provides Ruby's primary concurrency abstraction. Threads created with Thread.new share the same memory space and can access shared objects, but the GIL ensures only one thread executes Ruby code at a time. This design makes threads effective for I/O-bound operations where threads can release the GIL during I/O waits, but ineffective for CPU-bound parallelism.
# Concurrent execution with threads
threads = 5.times.map do |i|
Thread.new do
puts "Thread #{i} starting"
sleep(1) # Releases GIL during I/O
puts "Thread #{i} finishing"
end
end
threads.each(&:join)
# Threads interleave execution but GIL prevents parallel CPU work
Process forking provides true parallelism by creating separate Ruby interpreter instances. Each process has its own GIL and memory space, enabling parallel CPU work at the cost of higher memory overhead and more complex inter-process communication. The Process module handles fork operations.
# Parallel execution with processes
pids = 5.times.map do |i|
fork do
result = expensive_calculation(i)
puts "Process #{i}: #{result}"
end
end
pids.each { |pid| Process.wait(pid) }
# Each process runs independently with true parallelism
Ractors, introduced in Ruby 3.0, provide an actor-based parallelism model that allows true parallel execution of Ruby code. Each Ractor has its own GIL and cannot share most objects with other Ractors, communicating instead through message passing. This isolation enables parallel execution while maintaining thread safety.
# Parallel execution with Ractors
ractors = 5.times.map do |i|
Ractor.new(i) do |num|
result = expensive_calculation(num)
Ractor.yield(result)
end
end
results = ractors.map { |r| r.take }
# Ractors execute in true parallelism without GIL contention
The Mutex class provides mutual exclusion for shared resource access. Threads competing for a mutex will block until the lock becomes available, serializing access to protected code sections. This mechanism prevents race conditions but introduces contention that limits parallel performance.
# Thread-safe counter with Mutex
counter = 0
mutex = Mutex.new
threads = 10.times.map do
Thread.new do
1000.times do
mutex.synchronize do
counter += 1
end
end
end
end
threads.each(&:join)
puts counter # => 10000 (safe increment)
Queue and SizedQueue provide thread-safe, concurrent data structures for producer-consumer patterns. Multiple threads can safely push and pop items without explicit locking. SizedQueue adds blocking behavior when the queue reaches capacity, providing backpressure mechanisms.
# Producer-consumer with Queue
require 'thread'
queue = Queue.new
# Producer threads
producers = 3.times.map do |i|
Thread.new do
5.times do |j|
queue.push("Item #{i}-#{j}")
sleep(0.1)
end
end
end
# Consumer threads
consumers = 2.times.map do
Thread.new do
loop do
item = queue.pop
puts "Processing: #{item}"
sleep(0.2)
end
end
end
producers.each(&:join)
sleep(1)
consumers.each(&:kill)
The concurrent-ruby gem extends Ruby's concurrency primitives with production-grade abstractions. Concurrent::Future provides promise-style asynchronous execution, Concurrent::ThreadPoolExecutor manages worker thread pools, and Concurrent::Map offers lock-free concurrent hash operations.
require 'concurrent-ruby'
# Thread pool for concurrent execution
pool = Concurrent::FixedThreadPool.new(5)
futures = 10.times.map do |i|
Concurrent::Future.execute(executor: pool) do
expensive_operation(i)
end
end
results = futures.map(&:value) # Blocks until all complete
pool.shutdown
pool.wait_for_termination
Fiber provides cooperative concurrency where the programmer explicitly controls task switching through Fiber.yield. Unlike threads, fibers don't run in parallel and don't require locking for shared state, but require explicit yielding to prevent starvation.
# Cooperative concurrency with Fiber
fiber1 = Fiber.new do
puts "Fiber 1: Start"
Fiber.yield
puts "Fiber 1: Resume"
Fiber.yield
puts "Fiber 1: End"
end
fiber2 = Fiber.new do
puts "Fiber 2: Start"
Fiber.yield
puts "Fiber 2: End"
end
fiber1.resume # Fiber 1: Start
fiber2.resume # Fiber 2: Start
fiber1.resume # Fiber 1: Resume
fiber2.resume # Fiber 2: End
fiber1.resume # Fiber 1: End
Practical Examples
Web Server Request Handling
A web server demonstrates concurrency through handling multiple client connections simultaneously. Each request progresses independently, with the server switching between requests as they wait for database queries, external API calls, or file I/O.
require 'socket'
require 'thread'
server = TCPServer.new(3000)
puts "Server listening on port 3000"
# Concurrent request handling with threads
loop do
Thread.new(server.accept) do |client|
request = client.gets
puts "Handling request: #{request}"
# Simulate I/O-bound work (database query, API call)
sleep(2)
response = "HTTP/1.1 200 OK\r\n\r\nRequest processed\n"
client.puts(response)
client.close
end
end
# Multiple requests progress concurrently
# I/O operations release GIL enabling effective concurrency
Image Processing Pipeline
Image processing demonstrates true parallelism where independent image transformations execute simultaneously on different CPU cores. Process-based parallelism avoids GIL limitations for CPU-intensive operations.
require 'mini_magick'
image_files = Dir.glob("images/*.jpg")
chunk_size = (image_files.length / 4.0).ceil
chunks = image_files.each_slice(chunk_size).to_a
# Parallel processing with forked processes
pids = chunks.map.with_index do |chunk, i|
fork do
chunk.each do |file|
image = MiniMagick::Image.open(file)
image.resize "800x600"
image.write("processed/#{File.basename(file)}")
end
puts "Worker #{i} completed #{chunk.length} images"
end
end
pids.each { |pid| Process.wait(pid) }
puts "All images processed"
# Each process runs in true parallelism on separate cores
# CPU-bound work benefits from parallel execution
Background Job Processing
Background job systems combine concurrency and parallelism. Workers process jobs concurrently within each process, while multiple worker processes achieve parallelism across cores.
require 'concurrent-ruby'
class JobWorker
def initialize(queue, num_threads)
@queue = queue
@pool = Concurrent::FixedThreadPool.new(num_threads)
@running = true
end
def start
@pool.post { process_jobs } while @running
end
def stop
@running = false
@pool.shutdown
@pool.wait_for_termination
end
private
def process_jobs
while @running
job = @queue.pop(true) rescue nil
next unless job
case job[:type]
when :email
send_email(job[:data])
when :report
generate_report(job[:data])
when :cleanup
cleanup_resources(job[:data])
end
end
end
end
# Run multiple worker processes, each with concurrent threads
4.times do
fork do
queue = Queue.new # In production, use Redis or database
worker = JobWorker.new(queue, 5)
worker.start
end
end
Data Aggregation with Ractors
Data aggregation across large datasets benefits from Ractor-based parallelism. Each Ractor processes a subset of data independently, then results merge through message passing.
# Parallel data processing with Ractors
data_chunks = large_dataset.each_slice(1000).to_a
ractors = data_chunks.map do |chunk|
Ractor.new(chunk) do |data|
# Each Ractor processes its chunk in parallel
result = data.reduce(Hash.new(0)) do |acc, item|
acc[item[:category]] += item[:value]
acc
end
result # Return via message passing
end
end
# Collect and merge results from all Ractors
final_result = ractors.reduce(Hash.new(0)) do |acc, ractor|
partial = ractor.take
partial.each { |k, v| acc[k] += v }
acc
end
puts "Aggregated results: #{final_result}"
# Each Ractor achieves true parallelism on separate cores
Concurrent API Client
API clients demonstrate effective concurrent I/O operations. Multiple HTTP requests execute concurrently, with threads waiting during network I/O while others progress.
require 'net/http'
require 'json'
class ConcurrentAPIClient
def initialize(base_url, max_threads: 10)
@base_url = base_url
@pool = Concurrent::FixedThreadPool.new(max_threads)
end
def fetch_multiple(endpoints)
futures = endpoints.map do |endpoint|
Concurrent::Future.execute(executor: @pool) do
fetch_endpoint(endpoint)
end
end
futures.map(&:value)
end
private
def fetch_endpoint(endpoint)
uri = URI("#{@base_url}#{endpoint}")
response = Net::HTTP.get_response(uri)
JSON.parse(response.body) if response.is_a?(Net::HTTPSuccess)
end
end
client = ConcurrentAPIClient.new("https://api.example.com")
endpoints = ["/users", "/posts", "/comments", "/tags"]
results = client.fetch_multiple(endpoints)
# Requests execute concurrently, overlapping network I/O waits
Parallel Computation with MapReduce
MapReduce patterns demonstrate parallelism in data processing. The map phase distributes work across processors, the reduce phase combines results, and both phases can execute in parallel when data partitions are independent.
class ParallelMapReduce
def self.map_reduce(data, map_fn, reduce_fn)
# Distribute data across Ractors for parallel map
chunk_size = (data.length / Concurrent.processor_count.to_f).ceil
chunks = data.each_slice(chunk_size).to_a
map_ractors = chunks.map do |chunk|
Ractor.new(chunk, map_fn) do |data, mapper|
data.map { |item| mapper.call(item) }
end
end
# Collect mapped results
mapped = map_ractors.flat_map(&:take)
# Group for reduce phase
grouped = mapped.group_by(&:first)
# Parallel reduce across groups
reduce_ractors = grouped.map do |key, values|
Ractor.new(key, values, reduce_fn) do |k, vals, reducer|
[k, reducer.call(vals.map(&:last))]
end
end
reduce_ractors.map(&:take).to_h
end
end
# Word count example
mapper = ->(line) { line.split.map { |word| [word, 1] } }.flatten(1)
reducer = ->(counts) { counts.sum }
result = ParallelMapReduce.map_reduce(
text_lines,
mapper,
reducer
)
# Both map and reduce execute in true parallelism
Design Considerations
I/O-bound vs CPU-bound workload characteristics determine the appropriate concurrency model. I/O-bound operations spend most execution time waiting for external resources like network responses, disk reads, or database queries. These workloads benefit from concurrent threading in MRI because threads can release the GIL during I/O waits, allowing other threads to execute. CPU-bound operations perform intensive calculations and cannot benefit from threading in MRI due to the GIL, requiring process-based parallelism or Ractors instead.
# I/O-bound: threads work well
def fetch_multiple_apis
threads = urls.map do |url|
Thread.new { HTTP.get(url) } # GIL released during network I/O
end
threads.map(&:value)
end
# CPU-bound: processes or Ractors required
def parallel_calculations
pids = datasets.map do |data|
fork { calculate_intensive(data) } # True parallelism needed
end
pids.each { |pid| Process.wait(pid) }
end
Memory constraints influence process vs thread decisions. Each forked process duplicates the parent's memory space, consuming significant RAM when spawning many workers. Ten processes each using 500MB requires 5GB total. Threads share memory space, making them memory-efficient for high-concurrency scenarios. Applications handling thousands of concurrent connections typically use threads or async I/O rather than one process per connection.
Coordination overhead affects parallel efficiency. Tasks requiring frequent synchronization or communication experience overhead that reduces parallel speedup. Lock contention forces threads to wait sequentially, converting parallel work into serial execution. Message passing between processes or Ractors incurs marshaling and communication costs. Problems with minimal interdependencies parallelize more effectively than tightly coupled tasks.
Fault isolation requirements guide architecture choices. Threads share memory space, meaning one thread's memory corruption or exception can affect others. Processes provide complete isolation, containing failures within individual workers. Critical systems often use multiple worker processes with health checks and automatic restart, accepting higher memory costs for improved reliability.
Existing infrastructure shapes implementation strategy. Applications already using thread pools or async I/O frameworks like EventMachine or Async should extend existing patterns rather than introducing conflicting models. Rails applications typically use threaded application servers like Puma, making thread-based concurrency natural. Background job systems like Sidekiq use multi-process workers with threaded execution per worker.
Scaling characteristics differ between models. Thread-based concurrency scales within a single process but cannot utilize multiple cores effectively for CPU work in MRI. Process-based parallelism scales across cores and machines but requires explicit work distribution and result collection. Hybrid approaches using multiple processes with thread pools per process provide both within-process concurrency and cross-core parallelism.
Testing and debugging complexity varies significantly. Concurrent programs introduce timing-dependent behaviors that make bugs intermittent and difficult to reproduce. Parallel programs compound this with true simultaneous execution, race conditions, and deadlock potential. Sequential code paths execute predictably, while concurrent paths require reasoning about all possible interleavings and parallel execution introduces actual simultaneous state changes.
Library and gem compatibility affects options. Not all Ruby gems are thread-safe, with some using global state or mutable class variables without proper synchronization. Database connection pools must support concurrent access. Native extensions may have their own threading models that conflict with Ruby's. Ractors restrict object sharing, preventing use of most existing gems without modification.
Performance Considerations
The Global Interpreter Lock fundamentally limits thread-based parallelism in MRI. Ruby threads cannot execute Ruby code simultaneously regardless of available CPU cores. Only one thread holds the GIL at any time, forcing serial execution of Ruby code. This design makes threading ineffective for CPU-bound parallelism but acceptable for I/O-bound concurrency where threads release the GIL during blocking operations.
# CPU-bound work shows no speedup with threads
require 'benchmark'
def fibonacci(n)
return n if n <= 1
fibonacci(n-1) + fibonacci(n-2)
end
# Single-threaded
single_time = Benchmark.realtime do
4.times { fibonacci(35) }
end
# Multi-threaded
multi_time = Benchmark.realtime do
threads = 4.times.map do
Thread.new { fibonacci(35) }
end
threads.each(&:join)
end
puts "Single: #{single_time}s, Multi: #{multi_time}s"
# Multi-threaded time approximately equals single-threaded
# No parallel speedup due to GIL
Process overhead creates minimum problem size thresholds. Forking processes incurs startup cost and memory duplication overhead that dominates execution time for small tasks. Parallel processing only improves performance when task execution time significantly exceeds process creation and coordination costs. Memory-mapped files and copy-on-write optimization reduce but don't eliminate this overhead.
# Small tasks: process overhead dominates
Benchmark.realtime do
100.times { fork { 1 + 1 } } # Expensive due to fork overhead
end
# Large tasks: parallelism beneficial
Benchmark.realtime do
4.times { fork { expensive_calculation } } # Fork cost amortized
end
Lock contention creates serial bottlenecks in parallel code. Multiple threads or processes competing for shared resources through locks serialize execution at contention points. High contention reduces effective parallelism to near-sequential performance. Contention increases with core count and longer critical sections, paradoxically making parallel code slower as more processors become available.
# High contention limits parallel benefit
mutex = Mutex.new
counter = 0
threads = 8.times.map do
Thread.new do
10000.times do
mutex.synchronize { counter += 1 } # Frequent lock contention
end
end
end
# Threads spend most time waiting for mutex
# Performance barely improves over single-threaded
Cache coherency overhead increases with core count. Processors maintain local caches of memory that must stay synchronized across cores. When one core modifies cached data, other cores must invalidate or update their copies. This coherency traffic creates overhead that grows with core count, particularly for false sharing where unrelated data shares cache lines.
Amdahl's Law quantifies parallel speedup limits. The sequential portion of code fundamentally limits maximum speedup regardless of available processors. Code with 20% sequential work cannot exceed 5x speedup even with infinite processors. Measuring sequential portions through profiling reveals parallelization potential before investing implementation effort.
Context switching overhead affects high-concurrency scenarios. Each thread or fiber requires stack memory and incurs context switch costs when the scheduler switches between them. Creating thousands of threads consumes significant memory and CPU cycles in switching overhead. Event-driven or fiber-based approaches reduce this overhead by keeping fewer kernel threads active.
Memory bandwidth saturation limits parallel scaling. Multiple processors accessing memory simultaneously can saturate memory bus bandwidth, particularly for memory-intensive workloads. Processors stall waiting for memory access, reducing actual CPU utilization. Cache-efficient algorithms that maximize data locality achieve better parallel scaling by reducing memory traffic.
I/O capacity creates throughput ceilings. Concurrent I/O operations share limited I/O bandwidth and operation queues. Disk throughput, network bandwidth, and database connection pools all impose upper bounds on concurrent request handling. Parallel execution cannot exceed these physical limits, making additional concurrency ineffective beyond capacity thresholds.
Common Pitfalls
Assuming threaded code achieves parallelism in MRI leads to performance surprises. Developers familiar with other languages expect threads to utilize multiple cores, but MRI's GIL prevents this. CPU-intensive multithreaded Ruby code often performs worse than sequential code due to thread management overhead without parallel execution benefits.
# Pitfall: expecting parallel speedup from threads
threads = 4.times.map do
Thread.new do
# CPU-intensive work gets no speedup
1_000_000.times { Math.sqrt(rand) }
end
end
threads.each(&:join)
# Runs no faster than sequential version due to GIL
Race conditions create intermittent, environment-dependent failures. Multiple threads or processes accessing shared state without synchronization produce timing-dependent outcomes. Bugs manifest inconsistently based on execution timing, making reproduction difficult. Tests may pass locally but fail in production under different load conditions.
# Race condition on shared state
@counter = 0
threads = 10.times.map do
Thread.new do
1000.times do
temp = @counter # Read
temp += 1 # Modify
@counter = temp # Write
end
end
end
threads.each(&:join)
puts @counter # Not 10000 - lost updates due to race
Deadlocks occur when threads wait circularly for resources. Thread A holds Lock 1 and waits for Lock 2 while Thread B holds Lock 2 and waits for Lock 1. Both threads block indefinitely. Consistent lock ordering prevents deadlocks but requires discipline across the codebase.
# Deadlock potential with inconsistent lock ordering
mutex_a = Mutex.new
mutex_b = Mutex.new
thread1 = Thread.new do
mutex_a.synchronize do
sleep(0.1) # Increase deadlock likelihood
mutex_b.synchronize { puts "Thread 1" }
end
end
thread2 = Thread.new do
mutex_b.synchronize do
sleep(0.1)
mutex_a.synchronize { puts "Thread 2" }
end
end
# Both threads may deadlock waiting for each other
Shared mutable state without synchronization causes corruption. Objects modified by multiple threads simultaneously experience torn reads, partial updates, and inconsistent state. Even simple operations like incrementing a counter are non-atomic, requiring synchronization for correctness.
# Data corruption without synchronization
class UnsafeCache
def initialize
@cache = {}
end
def get(key)
@cache[key] # Concurrent reads may see partial state
end
def set(key, value)
@cache[key] = value # Concurrent writes may corrupt hash
end
end
# Multiple threads corrupting shared hash
cache = UnsafeCache.new
threads = 100.times.map do |i|
Thread.new { cache.set(i, "value#{i}") }
end
threads.each(&:join)
# Cache may be corrupted with missing or incorrect entries
Memory visibility issues cause one thread to miss updates from another. Modern processors cache memory locally and reorder operations for performance. Without synchronization primitives, updates made by one thread may not become visible to others. Volatile reads and writes through mutexes or atomic operations establish visibility guarantees.
# Visibility issue with unsynchronized flag
@stop_flag = false
worker = Thread.new do
until @stop_flag # May never see update
do_work
end
end
sleep(5)
@stop_flag = true # Update may not be visible to worker thread
worker.join
# Worker may continue running indefinitely
Exception handling in threads requires explicit attention. Uncaught exceptions in threads silently terminate that thread without propagating to the parent. The main thread continues unaware of worker thread failures unless explicitly checking thread status.
# Silent thread failure
thread = Thread.new do
raise "Oops" # Exception terminates thread silently
end
sleep(1)
puts "Main thread continues unaware of failure"
# Correct: check thread status
begin
thread.join # Raises exception from thread
rescue => e
puts "Thread failed: #{e.message}"
end
Fork safety issues arise from threads and file descriptors. Forking with active threads creates undefined behavior because only the forking thread exists in the child process, leaving mutexes locked or other threads' work incomplete. Open file descriptors persist across fork, requiring cleanup in child processes to avoid unintended resource sharing.
# Unsafe fork with active threads
mutex = Mutex.new
thread = Thread.new do
mutex.synchronize { sleep(10) }
end
fork do
# Child process has locked mutex but no thread to release it
mutex.synchronize { puts "Deadlock in child" }
end
Over-parallelization degrades performance through coordination overhead. Creating more threads or processes than available cores introduces excessive context switching and memory consumption without performance benefit. Task granularity must balance parallel opportunity against coordination costs.
Reference
Core Concepts Comparison
| Aspect | Concurrency | Parallelism |
|---|---|---|
| Definition | Managing multiple tasks | Executing multiple tasks simultaneously |
| Focus | Program structure | Actual execution |
| Single Core | Possible through interleaving | Not possible |
| Multi-Core | Possible with or without parallel execution | Requires multiple cores |
| Primary Benefit | Responsiveness, resource utilization | Performance, reduced latency |
| Coordination | Required for task switching | Required plus simultaneous execution complexity |
Ruby Concurrency Primitives
| Primitive | Parallelism | Memory Model | Use Case |
|---|---|---|---|
| Thread | No (GIL) | Shared | I/O-bound concurrency |
| Process | Yes | Isolated | CPU-bound parallelism |
| Ractor | Yes | Message passing | CPU-bound with isolation |
| Fiber | No | Shared, cooperative | Structured concurrency, generators |
| Queue | N/A | Thread-safe | Producer-consumer patterns |
| Mutex | N/A | Synchronization | Protecting shared resources |
Thread Methods
| Method | Purpose | Behavior |
|---|---|---|
| Thread.new | Create thread | Returns thread object immediately |
| Thread.current | Current thread | Returns executing thread |
| join | Wait for completion | Blocks until thread finishes |
| value | Get return value | Blocks until thread finishes, returns value |
| alive? | Check status | Returns true if thread running |
| kill | Terminate thread | Immediately stops thread execution |
| status | Get state | Returns run, sleep, aborting, false, or nil |
Process Methods
| Method | Purpose | Behavior |
|---|---|---|
| fork | Create child process | Returns pid in parent, nil in child |
| wait | Wait for child | Blocks until any child exits |
| waitpid | Wait for specific child | Blocks until specified child exits |
| exit | Terminate process | Immediately exits with status code |
| pid | Current process ID | Returns integer process identifier |
| daemon | Daemonize | Detaches from controlling terminal |
Ractor Operations
| Operation | Purpose | Behavior |
|---|---|---|
| Ractor.new | Create ractor | Returns ractor object |
| send | Send message | Queues message for ractor |
| receive | Receive message | Blocks until message available |
| take | Get return value | Blocks until ractor finishes |
| shareable? | Check shareability | Returns true if object can be shared |
Synchronization Primitives
| Primitive | Type | Characteristics |
|---|---|---|
| Mutex | Mutual exclusion | Blocks threads waiting for lock |
| Monitor | Reentrant mutex | Same thread can acquire multiple times |
| ConditionVariable | Condition waiting | Wait for condition with timeout |
| Queue | Thread-safe queue | Blocking push/pop operations |
| SizedQueue | Bounded queue | Blocks when full or empty |
Decision Matrix
| Workload Type | Ruby Solution | Reasoning |
|---|---|---|
| I/O-bound, high concurrency | Threads | GIL released during I/O, memory efficient |
| CPU-bound, low count | Processes | True parallelism, isolated failures |
| CPU-bound, high count | Ractor | Parallel execution with lower overhead |
| Memory-constrained | Threads or Fibers | Shared memory space |
| Need isolation | Processes | Separate memory spaces |
| Event-driven | Fibers or EventMachine | Cooperative scheduling |
Performance Characteristics
| Approach | Startup Cost | Memory Overhead | CPU Utilization | Coordination Cost |
|---|---|---|---|---|
| Single Thread | None | Baseline | One core max | None |
| Multiple Threads | Low | Low | One core (GIL) | Mutex contention |
| Multiple Processes | High | High | Multiple cores | IPC overhead |
| Ractors | Medium | Medium | Multiple cores | Message passing |
| Fibers | Very low | Very low | One core | Manual yields |
Common Patterns
| Pattern | Implementation | Use Case |
|---|---|---|
| Thread Pool | Fixed threads processing queue | Web server request handling |
| Fork-Join | Fork workers, wait for completion | Parallel batch processing |
| Pipeline | Queues between stages | Data processing pipeline |
| Actor Model | Ractors with message passing | Concurrent stateful entities |
| Producer-Consumer | Queue with multiple readers/writers | Decoupling data generation from processing |
| Work Stealing | Threads take from shared queue | Dynamic load balancing |
GIL Behavior
| Operation | GIL Status | Parallel Execution |
|---|---|---|
| Ruby code execution | Held | No |
| Blocking I/O | Released | Yes |
| sleep | Released | Yes |
| Native extension | Released if designed properly | Yes |
| C-level operations | Held or released by implementation | Varies |