Overview
Processes and threads represent two distinct approaches to concurrent execution in operating systems. A process is an independent program in execution with its own memory space, system resources, and execution context. A thread is a lightweight execution unit within a process that shares the process's memory space and resources with other threads in the same process.
The process model emerged in early operating systems as a way to isolate running programs from each other. Each process receives its own virtual address space, file descriptors, and system resources. The operating system scheduler treats processes as independent entities, switching between them to create the illusion of parallel execution on single-core systems or actual parallelism on multi-core systems.
Threads appeared later to address the overhead of process-based concurrency. Creating a new process requires duplicating memory structures and system resources, which takes time and memory. Threads within the same process share most resources, making thread creation and context switching faster than process creation and switching. This shared-memory model also simplifies communication between concurrent execution units within the same application.
The choice between processes and threads affects application architecture, performance characteristics, fault isolation, and debugging complexity. Web servers like Apache use process-based concurrency models, while databases like PostgreSQL use a hybrid approach. Understanding these differences guides architectural decisions in concurrent systems.
# Process creation in Ruby
pid = fork do
puts "Child process: #{Process.pid}"
sleep 2
end
puts "Parent process: #{Process.pid}"
Process.wait(pid)
# => Parent process: 12345
# => Child process: 12346
# Thread creation in Ruby
thread = Thread.new do
puts "Thread: #{Thread.current.object_id}"
sleep 2
end
puts "Main thread: #{Thread.main.object_id}"
thread.join
# => Main thread: 70123456789
# => Thread: 70123456790
Key Principles
Memory Isolation: Processes operate in separate virtual address spaces. Each process receives its own copy of memory, including code, data, heap, and stack segments. The operating system's memory management unit enforces this isolation through page tables and memory protection mechanisms. One process cannot directly access another process's memory without explicit inter-process communication mechanisms.
Threads within a process share the same virtual address space. All threads access the same heap memory, global variables, and code segments. Each thread maintains its own stack for local variables and function call frames, but these stacks exist within the shared address space. This shared memory enables fast communication but requires synchronization mechanisms to prevent race conditions.
Resource Ownership: A process owns system resources including file descriptors, network sockets, environment variables, signal handlers, and working directory. When a process creates child processes through forking, the child initially receives copies of these resources. Subsequent changes in parent or child do not affect the other.
Threads share these resources with all threads in the same process. Opening a file in one thread makes that file descriptor available to all threads. This sharing reduces resource consumption but requires careful coordination when multiple threads access shared resources concurrently.
Creation and Context Switching Cost: Process creation involves allocating a new virtual address space, copying page tables, duplicating file descriptor tables, and initializing process control blocks. On Unix systems, the fork system call uses copy-on-write optimization, where memory pages are only copied when modified. Despite this optimization, process creation remains more expensive than thread creation, typically requiring milliseconds.
Thread creation allocates only a new stack and thread control block. The operating system registers the new thread with the scheduler but does not create a new address space. Thread creation typically completes in microseconds. Context switching between processes requires saving and restoring the entire process state, including memory management structures. Thread context switches only save and restore CPU registers and stack pointers, making them faster than process switches.
Fault Isolation: When a process crashes due to a segmentation fault, null pointer dereference, or unhandled exception, only that process terminates. Other processes continue executing unaffected. This isolation protects system stability and prevents cascading failures in multi-process architectures.
Thread crashes typically terminate the entire process containing all threads. A segmentation fault in one thread corrupts the shared address space, making it unsafe for other threads to continue. This reduced isolation makes thread-based systems more vulnerable to single points of failure but simplifies error propagation within an application.
Communication Mechanisms: Inter-process communication requires explicit mechanisms such as pipes, message queues, shared memory segments, or network sockets. These mechanisms involve system calls and data copying between address spaces, adding latency and complexity. The operating system mediates all inter-process communication, enforcing security and isolation boundaries.
Threads communicate through shared memory variables. One thread writes to a variable, and another thread reads it directly without system calls or data copying. This zero-copy communication offers lower latency but requires synchronization primitives like mutexes, semaphores, or condition variables to coordinate access and prevent race conditions.
Concurrency Model: Processes provide true parallelism on multi-core systems. The operating system can schedule processes on different CPU cores simultaneously. Each process executes independently without interference from other processes at the instruction level.
Thread parallelism depends on the threading implementation. Native threads scheduled by the operating system can execute in parallel on multiple cores. Green threads or user-space threads managed by a runtime scheduler may execute concurrently on a single core through time-slicing but cannot achieve true parallelism. Ruby's Global Interpreter Lock affects thread parallelism in specific ways discussed in the Ruby Implementation section.
Ruby Implementation
Ruby provides both process and thread creation through standard library APIs. The Process module handles process operations, while the Thread class manages thread creation and synchronization.
Process Operations: Ruby's Process module wraps Unix-style process management. The fork method creates a child process by duplicating the current process. The child process begins execution at the point immediately after the fork call, receiving a nil return value, while the parent receives the child's process ID.
# Basic process forking
child_pid = fork do
puts "Child executing: PID #{Process.pid}"
exit 42
end
puts "Parent executing: PID #{Process.pid}, spawned #{child_pid}"
pid, status = Process.wait2(child_pid)
puts "Child #{pid} exited with status #{status.exitstatus}"
# => Parent executing: PID 1000, spawned 1001
# => Child executing: PID 1001
# => Child 1001 exited with status 42
The spawn method provides more control over child process execution, including setting environment variables, redirecting file descriptors, and executing different programs:
# Spawn a new program
pid = Process.spawn(
{"CUSTOM_VAR" => "value"},
"ruby", "-e", "puts ENV['CUSTOM_VAR']",
out: "/tmp/output.log",
err: "/tmp/error.log"
)
Process.wait(pid)
Thread Management: Ruby's Thread class creates native threads scheduled by the operating system. Each Thread instance represents an independent execution context within the Ruby process:
# Thread creation with parameters
threads = 5.times.map do |i|
Thread.new(i) do |index|
sleep rand(0.1..0.5)
puts "Thread #{index} completed"
index * 2
end
end
results = threads.map(&:value)
puts "Results: #{results}"
# => Thread 2 completed
# => Thread 0 completed
# => Thread 4 completed
# => Thread 1 completed
# => Thread 3 completed
# => Results: [0, 2, 4, 6, 8]
The value method blocks until the thread completes and returns the last expression evaluated in the thread block. The join method blocks without returning a value.
Global Interpreter Lock: Ruby uses a Global Interpreter Lock (GIL), also called the Global VM Lock, that prevents multiple threads from executing Ruby code simultaneously. Only one thread can execute Ruby bytecode at any given moment, even on multi-core systems. This design simplifies interpreter implementation and ensures thread safety for internal data structures.
The GIL releases during I/O operations, allowing other threads to execute Ruby code while one thread waits for I/O. This makes threads effective for I/O-bound workloads despite the GIL:
require 'net/http'
# I/O-bound work benefits from threads despite GIL
start_time = Time.now
threads = 10.times.map do |i|
Thread.new do
uri = URI("https://httpbin.org/delay/1")
response = Net::HTTP.get(uri)
response.length
end
end
results = threads.map(&:value)
elapsed = Time.now - start_time
puts "Fetched 10 URLs in #{elapsed.round(2)} seconds"
# => Fetched 10 URLs in 1.23 seconds (not 10 seconds)
For CPU-bound work, the GIL prevents parallelism. Multiple threads do not improve performance and may degrade it due to lock contention:
# CPU-bound work does not benefit from threads
def calculate_prime(n)
(2..n).select do |i|
(2..Math.sqrt(i)).none? { |d| i % d == 0 }
end.count
end
start_time = Time.now
threads = 4.times.map do
Thread.new { calculate_prime(50000) }
end
threads.each(&:join)
threaded_time = Time.now - start_time
start_time = Time.now
4.times { calculate_prime(50000) }
sequential_time = Time.now - start_time
puts "Threaded: #{threaded_time.round(2)}s"
puts "Sequential: #{sequential_time.round(2)}s"
# => Threaded: 8.45s
# => Sequential: 8.12s (threads slower due to GIL contention)
Process-Based Parallelism: For CPU-bound parallelism in Ruby, processes provide true parallel execution. Each process runs its own Ruby interpreter without GIL interference:
require 'parallel'
# Using processes for CPU parallelism
start_time = Time.now
results = Parallel.map([50000] * 4, in_processes: 4) do |n|
calculate_prime(n)
end
process_time = Time.now - start_time
puts "Process-based: #{process_time.round(2)}s"
# => Process-based: 2.15s (actual parallelism on 4 cores)
Thread Synchronization: Ruby provides Mutex for mutual exclusion, ConditionVariable for thread coordination, and Queue for thread-safe data structures:
# Thread-safe counter with Mutex
class Counter
def initialize
@count = 0
@mutex = Mutex.new
end
def increment
@mutex.synchronize do
current = @count
sleep 0.001 # Simulate work
@count = current + 1
end
end
def value
@mutex.synchronize { @count }
end
end
counter = Counter.new
threads = 10.times.map do
Thread.new { 100.times { counter.increment } }
end
threads.each(&:join)
puts "Final count: #{counter.value}"
# => Final count: 1000 (correct with mutex)
Without the mutex, race conditions cause incorrect results:
# Race condition without synchronization
class UnsafeCounter
def initialize
@count = 0
end
def increment
current = @count
sleep 0.001
@count = current + 1
end
def value
@count
end
end
counter = UnsafeCounter.new
threads = 10.times.map do
Thread.new { 100.times { counter.increment } }
end
threads.each(&:join)
puts "Final count: #{counter.value}"
# => Final count: 47 (incorrect due to race conditions)
Design Considerations
Workload Characteristics: I/O-bound workloads benefit from threads when operations spend significant time waiting for external resources like network responses, disk reads, or database queries. Threads allow the application to handle multiple I/O operations concurrently without blocking. The GIL releases during I/O, enabling effective concurrency.
CPU-bound workloads in Ruby require processes for parallel execution. Operations that perform extensive calculations, data transformations, or algorithmic processing do not benefit from threads due to the GIL. Each process runs independently on separate CPU cores, achieving true parallelism.
Mixed workloads require hybrid approaches. A web application handling API requests might use processes for CPU-intensive request processing while using threads within each process for concurrent database queries.
Fault Tolerance Requirements: Applications requiring strong fault isolation should use processes. Each process failure affects only that process, allowing other processes to continue serving requests. Web servers like Unicorn and Puma (in clustered mode) use process-based concurrency to isolate request failures.
Shared-fate systems where component failures should terminate the entire application can use threads. A background job processor might use threads where any thread failure indicates a serious problem requiring full application restart.
Memory Constraints: Threads consume less memory than processes. A thread requires approximately 1-2 MB for its stack, while a process requires duplicating the entire address space, typically 20-50 MB or more for Ruby applications. Systems with memory constraints or those needing many concurrent execution units favor threads.
Process-based systems trade memory for isolation and parallelism. A web server using 10 processes consumes significantly more memory than one using 10 threads, but provides better CPU utilization for mixed workloads and stronger failure isolation.
Communication Patterns: Applications with frequent inter-unit communication favor threads. Shared memory communication has minimal overhead compared to inter-process communication. A data processing pipeline where stages pass data between steps benefits from thread-based implementation.
Applications with infrequent communication or those requiring strong boundaries between units work well with processes. Message-passing architectures using queues or message brokers suit process-based designs where each process handles requests independently.
Debugging and Development: Thread-based concurrency complicates debugging due to race conditions, deadlocks, and non-deterministic execution order. Reproducing thread-related bugs requires careful instrumentation and understanding of memory models. Development cycles lengthen when dealing with thread safety issues.
Process-based systems offer simpler debugging. Each process executes independently, making bugs reproducible. State inspection examines only one process's memory space. Process crashes generate clear stack traces without affecting debugging tools.
Deployment and Scaling: Horizontal scaling differs between processes and threads. Process-based systems scale by adding more processes across multiple machines. Load balancers distribute work between processes running on different servers. This scaling model integrates naturally with containerized deployments where each container runs independent processes.
Thread-based systems scale vertically within a single machine's resources. Adding more threads increases concurrency up to the limits of available CPU cores and memory. Horizontal scaling requires running multiple thread-using processes across machines, creating a two-level scaling architecture.
Performance Considerations
Startup Latency: Thread creation latency ranges from 10-100 microseconds depending on operating system and hardware. Creating 100 threads adds approximately 1-10 milliseconds to application startup. Process creation latency ranges from 1-10 milliseconds per process. Creating 10 processes adds 10-100 milliseconds to startup.
For applications requiring rapid startup, such as command-line tools or serverless functions, these differences affect user experience. Process-heavy architectures incur noticeable startup delays. Thread-based approaches start faster but may not achieve desired parallelism for CPU-bound work.
Context Switch Overhead: Thread context switches complete in 1-5 microseconds, involving saving and restoring CPU registers and stack pointers. Process context switches require 5-20 microseconds, including TLB flushes and page table updates. Under high concurrency with frequent context switches, these differences compound.
A web server handling 10,000 requests per second with 10 workers experiences approximately 1,000 context switches per worker per second. With threads, this overhead consumes approximately 5 milliseconds per second per worker (0.5% CPU). With processes, it consumes approximately 20 milliseconds per second per worker (2% CPU).
Memory Footprint: A typical Ruby process consumes 40-100 MB of memory for application code, loaded gems, and runtime structures. Each additional process duplicates this baseline, though shared library code remains in shared memory. Ten processes consume approximately 400-1000 MB total.
Threads within a process share the base memory. Each thread adds only its stack (1-2 MB) plus thread-local storage. Ten threads in one process consume approximately 50-120 MB total, roughly one-tenth the memory of ten processes.
Memory-intensive applications amplify these differences. A Rails application loading numerous gems and caching data in memory might require 200 MB per process. A process-based deployment with 20 workers consumes 4 GB of memory for the application alone, while a thread-based approach with 20 threads consumes approximately 250 MB.
Throughput Characteristics: For I/O-bound workloads, threads and processes achieve similar throughput. A web application making database queries spends most time waiting for I/O. Both approaches allow concurrent I/O operations, achieving comparable requests-per-second rates. Threads may show slightly better throughput due to lower context switch overhead.
For CPU-bound workloads, processes achieve higher throughput when work exceeds one core's capacity. Computing image transformations or running complex algorithms benefits from parallel execution across multiple cores. Processes provide linear speedup up to the number of cores, while threads remain constrained by the GIL.
Mixed workloads show nuanced performance characteristics. A web application with both quick database lookups and occasional heavy computation achieves best throughput with a hybrid approach: multiple processes (one per core) each running multiple threads for I/O concurrency.
Scalability Limits: Thread-based systems hit scalability limits around 1,000-10,000 threads per process, depending on available memory and operating system limits. Beyond this point, context switch overhead degrades performance. The operating system's thread scheduler struggles to efficiently manage thousands of threads.
Process-based systems scale to hundreds of processes per machine before system resources become constrained. Process count typically limits to avoid memory exhaustion rather than scheduling overhead. Distributed systems scale process-based architectures horizontally across machines, achieving thousands to millions of concurrent workers.
Resource Contention: Threads sharing memory structures create contention. Multiple threads accessing the same Mutex serialize execution, creating bottlenecks. High lock contention under concurrent load can reduce throughput to levels worse than sequential execution:
# Contention example
mutex = Mutex.new
shared_data = []
threads = 20.times.map do
Thread.new do
1000.times do
mutex.synchronize do
# Short critical section but high contention
shared_data << Thread.current.object_id
end
end
end
end
start = Time.now
threads.each(&:join)
elapsed = Time.now - start
puts "Completed in #{elapsed.round(2)}s with high contention"
# => Completed in 4.23s (poor performance due to contention)
Processes avoid shared memory contention but face other resource conflicts. Multiple processes writing to the same file require file locking. Multiple processes accessing the same database connection pool contend for limited connections.
Common Pitfalls
Race Conditions in Shared State: Thread-based systems frequently encounter race conditions where multiple threads access shared state without proper synchronization. The race condition occurs when execution order determines program correctness, and that order is non-deterministic:
# Race condition in shared counter
class TaskQueue
def initialize
@queue = []
@processed = 0
end
def add_task(task)
@queue << task
end
def process_task
if @queue.any?
task = @queue.shift
perform_work(task)
@processed += 1 # Race condition here
end
end
def perform_work(task)
sleep 0.01 # Simulate work
end
def stats
{ queue_size: @queue.size, processed: @processed }
end
end
queue = TaskQueue.new
100.times { |i| queue.add_task(i) }
threads = 10.times.map do
Thread.new do
10.times { queue.process_task }
end
end
threads.each(&:join)
puts queue.stats
# => {queue_size: 0, processed: 87} (should be 100)
The increment operation @processed += 1 translates to read-modify-write at the machine level. Two threads can read the same value, increment it, and both write back the incremented value, losing one increment.
Deadlock in Resource Acquisition: Threads acquiring multiple locks in different orders create deadlock potential. Thread A holds Lock 1 and waits for Lock 2 while Thread B holds Lock 2 and waits for Lock 1. Both threads block indefinitely:
# Deadlock scenario
mutex_a = Mutex.new
mutex_b = Mutex.new
thread1 = Thread.new do
mutex_a.synchronize do
sleep 0.1
puts "Thread 1 trying to get B"
mutex_b.synchronize do
puts "Thread 1 has both"
end
end
end
thread2 = Thread.new do
mutex_b.synchronize do
sleep 0.1
puts "Thread 2 trying to get A"
mutex_a.synchronize do
puts "Thread 2 has both"
end
end
end
thread1.join
thread2.join
# => Thread 1 trying to get B
# => Thread 2 trying to get A
# => (hangs forever in deadlock)
Always acquire locks in a consistent order across all threads to prevent deadlock.
Fork Safety Issues: Forking a multi-threaded process creates subtle problems. The child process inherits only the calling thread. Other threads disappear in the child, but mutexes and other synchronization primitives remain in their current state. If a non-existent thread held a mutex when the fork occurred, that mutex remains locked forever in the child:
# Fork safety problem
mutex = Mutex.new
data = []
thread = Thread.new do
loop do
mutex.synchronize do
data << Time.now
sleep 0.1
end
end
end
sleep 0.5 # Let thread run
pid = fork do
# Mutex might be locked from parent thread that no longer exists
mutex.synchronize do # May hang if parent thread held lock
puts "Child accessing data: #{data.size}"
end
end
Process.wait(pid)
thread.kill
Fork only single-threaded processes or reinitialize all synchronization primitives after forking. Better yet, avoid mixing threads and processes.
Memory Leaks in Long-Running Threads: Threads holding references to objects prevent garbage collection. A thread maintaining a local variable referencing a large data structure keeps that memory allocated even when no other code uses it:
# Memory leak in thread
def process_data_threaded
threads = 10.times.map do
Thread.new do
large_data = Array.new(1_000_000) { rand } # 8 MB array
loop do
# Thread runs forever, large_data never freed
process_item(large_data.sample)
sleep 1
end
end
end
# Threads never joined, continue holding memory
end
# Leaks 80 MB that remains allocated until process terminates
process_data_threaded
Explicitly join or kill threads when they complete their work. Avoid long-running threads holding large data structures. Use thread pools with finite thread lifetimes.
Signal Handling Complications: Process signals deliver to random threads in multi-threaded processes. A SIGTERM might interrupt any thread, potentially while holding locks or in the middle of critical operations. Handling signals safely in multi-threaded programs requires directing signals to specific threads:
# Unsafe signal handling
trap('INT') do
puts "Interrupted!"
exit
end
threads = 5.times.map do
Thread.new do
loop { perform_work }
end
end
threads.each(&:join)
# Ctrl-C might interrupt any thread, possibly mid-operation
Single-threaded processes or processes with signal-handling threads avoid this complexity. Signals deliver predictably to the process, which can handle them safely.
Process Zombie Accumulation: Forking processes without waiting for them creates zombie processes. Zombies remain in the process table, consuming process IDs until the parent waits on them. Creating many zombies can exhaust available process IDs:
# Zombie creation
1000.times do
fork do
sleep 0.1
exit
end
end
# Parent continues without waiting
# 1000 zombie processes accumulate
sleep 10
# System process table fills with zombies
# New process creation fails
Always wait for child processes using Process.wait or Process.detach for fire-and-forget children. Set up signal handlers to reap zombies asynchronously.
Practical Examples
Web Server Concurrency Models: A web server handling HTTP requests demonstrates practical process versus thread trade-offs. A simple process-per-request model forks a new process for each connection:
require 'socket'
# Process-per-request server
server = TCPServer.new(8080)
puts "Server listening on port 8080"
loop do
client = server.accept
fork do
request = client.gets
puts "Process #{Process.pid} handling request"
response = "HTTP/1.1 200 OK\r\n"
response += "Content-Type: text/plain\r\n"
response += "\r\n"
response += "Handled by process #{Process.pid}\n"
client.puts response
client.close
end
Process.detach(fork) # Prevent zombies
end
This model provides strong isolation but consumes excessive resources under high load. A thread-per-request model reduces overhead:
# Thread-per-request server
server = TCPServer.new(8080)
puts "Server listening on port 8080"
loop do
client = server.accept
Thread.new(client) do |conn|
request = conn.gets
puts "Thread #{Thread.current.object_id} handling request"
response = "HTTP/1.1 200 OK\r\n"
response += "Content-Type: text/plain\r\n"
response += "\r\n"
response += "Handled by thread #{Thread.current.object_id}\n"
conn.puts response
conn.close
end
end
Production servers use hybrid approaches: a fixed pool of processes, each running multiple threads. This balances resource efficiency with fault isolation and CPU parallelism.
Parallel Data Processing: Processing large datasets benefits from parallel execution. A CSV processing pipeline demonstrates the trade-offs:
require 'csv'
# Sequential processing baseline
def process_csv_sequential(filename)
results = []
CSV.foreach(filename, headers: true) do |row|
results << expensive_transformation(row)
end
results
end
# Thread-based parallel processing (limited by GIL)
def process_csv_threaded(filename, thread_count: 4)
rows = CSV.read(filename, headers: true)
chunks = rows.each_slice((rows.size / thread_count.to_f).ceil).to_a
threads = chunks.map do |chunk|
Thread.new do
chunk.map { |row| expensive_transformation(row) }
end
end
threads.flat_map(&:value)
end
# Process-based parallel processing (true parallelism)
def process_csv_processes(filename, process_count: 4)
rows = CSV.read(filename, headers: true)
chunk_size = (rows.size / process_count.to_f).ceil
# Write chunks to temporary files
chunk_files = process_count.times.map do |i|
chunk = rows[i * chunk_size, chunk_size] || []
file = "/tmp/chunk_#{i}.csv"
CSV.open(file, 'w') do |csv|
csv << rows.headers
chunk.each { |row| csv << row }
end
file
end
# Process chunks in parallel
pids = chunk_files.map do |file|
fork do
results = CSV.read(file, headers: true).map do |row|
expensive_transformation(row)
end
# Write results to pipe or file
puts results.to_json
end
end
pids.each { |pid| Process.wait(pid) }
end
def expensive_transformation(row)
# CPU-intensive transformation
(1..1000).inject(:*) % 12345
row.to_h.transform_values(&:upcase)
end
The threaded version sees minimal speedup due to the GIL, while the process version achieves near-linear speedup with core count.
Background Job Processing: A background job system demonstrates how workload characteristics influence process versus thread choice:
# Thread-based worker for I/O-bound jobs
class ThreadedJobWorker
def initialize(thread_count: 10)
@thread_count = thread_count
@queue = Queue.new
@threads = []
end
def start
@thread_count.times do
@threads << Thread.new do
loop do
job = @queue.pop
break if job == :shutdown
execute_job(job)
end
end
end
end
def enqueue(job)
@queue.push(job)
end
def shutdown
@thread_count.times { @queue.push(:shutdown) }
@threads.each(&:join)
end
private
def execute_job(job)
# I/O-bound job: API calls, database queries
case job[:type]
when :api_call
make_api_request(job[:url])
when :email
send_email(job[:recipient], job[:body])
when :database
update_database(job[:query])
end
rescue => e
log_error(job, e)
end
end
# Process-based worker for CPU-bound jobs
class ProcessWorker
def initialize(process_count: 4)
@process_count = process_count
@queue = []
@queue_file = '/tmp/job_queue.json'
end
def start
@process_count.times do |i|
fork do
worker_loop(i)
end
end
# Parent waits for all workers
@process_count.times { Process.wait }
end
def enqueue(job)
@queue << job
File.write(@queue_file, @queue.to_json)
end
private
def worker_loop(worker_id)
loop do
jobs = JSON.parse(File.read(@queue_file)) rescue []
break if jobs.empty?
job = jobs.shift
File.write(@queue_file, jobs.to_json)
execute_cpu_job(job)
end
end
def execute_cpu_job(job)
# CPU-bound job: image processing, report generation
case job['type']
when 'image_resize'
resize_image(job['image_path'])
when 'report_generation'
generate_complex_report(job['data'])
when 'video_encoding'
encode_video(job['video_path'])
end
end
end
The threaded worker handles I/O-bound jobs efficiently, while the process worker provides true parallelism for CPU-intensive operations.
Reference
Process Operations
| Operation | Ruby API | Description | Use Case |
|---|---|---|---|
| Create process | Process.fork | Duplicates current process | Parallel execution with isolation |
| Spawn program | Process.spawn | Executes new program in child | Running external commands |
| Wait for child | Process.wait | Blocks until child exits | Synchronizing with child completion |
| Wait with status | Process.wait2 | Returns PID and exit status | Checking child success/failure |
| Detach process | Process.detach | Prevents zombie accumulation | Fire-and-forget child processes |
| Kill process | Process.kill | Sends signal to process | Terminating child processes |
| Current PID | Process.pid | Returns current process ID | Logging and debugging |
| Parent PID | Process.ppid | Returns parent process ID | Process hierarchy tracking |
Thread Operations
| Operation | Ruby API | Description | Use Case |
|---|---|---|---|
| Create thread | Thread.new | Starts new thread | Concurrent execution |
| Wait for thread | Thread#join | Blocks until thread completes | Synchronizing thread completion |
| Get result | Thread#value | Returns thread return value | Collecting computation results |
| Kill thread | Thread#kill | Terminates thread immediately | Canceling operations |
| Current thread | Thread.current | Returns current thread object | Thread-local operations |
| Main thread | Thread.main | Returns main program thread | Identifying main execution context |
| List threads | Thread.list | Returns all living threads | Debugging thread leaks |
| Thread status | Thread#status | Returns run/sleep/aborting/false/nil | Monitoring thread state |
Synchronization Primitives
| Primitive | Ruby API | Purpose | Typical Use |
|---|---|---|---|
| Mutual exclusion | Mutex | Serializes access to shared state | Protecting critical sections |
| Condition variable | ConditionVariable | Coordinates thread waiting | Producer-consumer patterns |
| Thread-safe queue | Queue | Manages work distribution | Job queues and pipelines |
| Sized queue | SizedQueue | Queue with maximum size | Backpressure and flow control |
| Read-write lock | Not in stdlib | Allows multiple readers | Shared read-heavy data |
| Semaphore | Not in stdlib | Limits concurrent access | Resource pool management |
Performance Characteristics
| Metric | Threads | Processes | Impact |
|---|---|---|---|
| Creation time | 10-100 μs | 1-10 ms | Startup latency |
| Context switch | 1-5 μs | 5-20 μs | Throughput under load |
| Memory per unit | 1-2 MB | 40-100 MB | System capacity |
| Communication | Shared memory (ns) | IPC (10-100 μs) | Data transfer overhead |
| Fault isolation | None (shared fate) | Complete | System reliability |
| CPU parallelism | No (GIL) | Yes | CPU-bound performance |
| I/O parallelism | Yes | Yes | I/O-bound performance |
Common Patterns
| Pattern | Implementation | When to Use |
|---|---|---|
| Thread pool | Fixed thread count processing queue | Bounded concurrency for I/O |
| Process pool | Fixed process count with task distribution | CPU parallelism for compute work |
| Fork-join | Fork workers, wait for all | Parallel divide-and-conquer |
| Pipeline | Threads passing data through stages | Multi-stage data transformation |
| Producer-consumer | Queue between producer and consumer threads | Decoupled work generation and execution |
| Worker pool | Pre-forked processes accepting connections | Web server request handling |
Decision Matrix
| Requirement | Recommended Choice | Rationale |
|---|---|---|
| CPU-bound work | Processes | True parallelism without GIL |
| I/O-bound work | Threads | Efficient concurrency with less overhead |
| Strong isolation | Processes | Failure containment |
| Low memory usage | Threads | Shared address space |
| Fast startup | Threads | Minimal creation overhead |
| Simple debugging | Processes | Independent execution |
| Frequent communication | Threads | Shared memory access |
| Horizontal scaling | Processes | Natural distribution model |