CrackedRuby - Process vs Thread

Overview

Processes and threads represent two distinct approaches to concurrent execution in operating systems. A process is an independent program in execution with its own memory space, system resources, and execution context. A thread is a lightweight execution unit within a process that shares the process's memory space and resources with other threads in the same process.

The process model emerged in early operating systems as a way to isolate running programs from each other. Each process receives its own virtual address space, file descriptors, and system resources. The operating system scheduler treats processes as independent entities, switching between them to create the illusion of parallel execution on single-core systems or actual parallelism on multi-core systems.

Threads appeared later to address the overhead of process-based concurrency. Creating a new process requires duplicating memory structures and system resources, which takes time and memory. Threads within the same process share most resources, making thread creation and context switching faster than process creation and switching. This shared-memory model also simplifies communication between concurrent execution units within the same application.

The choice between processes and threads affects application architecture, performance characteristics, fault isolation, and debugging complexity. Web servers like Apache use process-based concurrency models, while databases like PostgreSQL use a hybrid approach. Understanding these differences guides architectural decisions in concurrent systems.

# Process creation in Ruby
pid = fork do
  puts "Child process: #{Process.pid}"
  sleep 2
end
puts "Parent process: #{Process.pid}"
Process.wait(pid)
# => Parent process: 12345
# => Child process: 12346

# Thread creation in Ruby
thread = Thread.new do
  puts "Thread: #{Thread.current.object_id}"
  sleep 2
end
puts "Main thread: #{Thread.main.object_id}"
thread.join
# => Main thread: 70123456789
# => Thread: 70123456790

Key Principles

Memory Isolation: Processes operate in separate virtual address spaces. Each process receives its own copy of memory, including code, data, heap, and stack segments. The operating system's memory management unit enforces this isolation through page tables and memory protection mechanisms. One process cannot directly access another process's memory without explicit inter-process communication mechanisms.

Threads within a process share the same virtual address space. All threads access the same heap memory, global variables, and code segments. Each thread maintains its own stack for local variables and function call frames, but these stacks exist within the shared address space. This shared memory enables fast communication but requires synchronization mechanisms to prevent race conditions.

Resource Ownership: A process owns system resources including file descriptors, network sockets, environment variables, signal handlers, and working directory. When a process creates child processes through forking, the child initially receives copies of these resources. Subsequent changes in parent or child do not affect the other.

Threads share these resources with all threads in the same process. Opening a file in one thread makes that file descriptor available to all threads. This sharing reduces resource consumption but requires careful coordination when multiple threads access shared resources concurrently.

Creation and Context Switching Cost: Process creation involves allocating a new virtual address space, copying page tables, duplicating file descriptor tables, and initializing process control blocks. On Unix systems, the fork system call uses copy-on-write optimization, where memory pages are only copied when modified. Despite this optimization, process creation remains more expensive than thread creation, typically requiring milliseconds.

Thread creation allocates only a new stack and thread control block. The operating system registers the new thread with the scheduler but does not create a new address space. Thread creation typically completes in microseconds. Context switching between processes requires saving and restoring the entire process state, including memory management structures. Thread context switches only save and restore CPU registers and stack pointers, making them faster than process switches.

Fault Isolation: When a process crashes due to a segmentation fault, null pointer dereference, or unhandled exception, only that process terminates. Other processes continue executing unaffected. This isolation protects system stability and prevents cascading failures in multi-process architectures.

Thread crashes typically terminate the entire process containing all threads. A segmentation fault in one thread corrupts the shared address space, making it unsafe for other threads to continue. This reduced isolation makes thread-based systems more vulnerable to single points of failure but simplifies error propagation within an application.

Communication Mechanisms: Inter-process communication requires explicit mechanisms such as pipes, message queues, shared memory segments, or network sockets. These mechanisms involve system calls and data copying between address spaces, adding latency and complexity. The operating system mediates all inter-process communication, enforcing security and isolation boundaries.

Threads communicate through shared memory variables. One thread writes to a variable, and another thread reads it directly without system calls or data copying. This zero-copy communication offers lower latency but requires synchronization primitives like mutexes, semaphores, or condition variables to coordinate access and prevent race conditions.

Concurrency Model: Processes provide true parallelism on multi-core systems. The operating system can schedule processes on different CPU cores simultaneously. Each process executes independently without interference from other processes at the instruction level.

Thread parallelism depends on the threading implementation. Native threads scheduled by the operating system can execute in parallel on multiple cores. Green threads or user-space threads managed by a runtime scheduler may execute concurrently on a single core through time-slicing but cannot achieve true parallelism. Ruby's Global Interpreter Lock affects thread parallelism in specific ways discussed in the Ruby Implementation section.

Ruby Implementation

Ruby provides both process and thread creation through standard library APIs. The Process module handles process operations, while the Thread class manages thread creation and synchronization.

Process Operations: Ruby's Process module wraps Unix-style process management. The fork method creates a child process by duplicating the current process. The child process begins execution at the point immediately after the fork call, receiving a nil return value, while the parent receives the child's process ID.

# Basic process forking
child_pid = fork do
  puts "Child executing: PID #{Process.pid}"
  exit 42
end

puts "Parent executing: PID #{Process.pid}, spawned #{child_pid}"
pid, status = Process.wait2(child_pid)
puts "Child #{pid} exited with status #{status.exitstatus}"
# => Parent executing: PID 1000, spawned 1001
# => Child executing: PID 1001
# => Child 1001 exited with status 42

The spawn method provides more control over child process execution, including setting environment variables, redirecting file descriptors, and executing different programs:

# Spawn a new program
pid = Process.spawn(
  {"CUSTOM_VAR" => "value"},
  "ruby", "-e", "puts ENV['CUSTOM_VAR']",
  out: "/tmp/output.log",
  err: "/tmp/error.log"
)
Process.wait(pid)

Thread Management: Ruby's Thread class creates native threads scheduled by the operating system. Each Thread instance represents an independent execution context within the Ruby process:

# Thread creation with parameters
threads = 5.times.map do |i|
  Thread.new(i) do |index|
    sleep rand(0.1..0.5)
    puts "Thread #{index} completed"
    index * 2
  end
end

results = threads.map(&:value)
puts "Results: #{results}"
# => Thread 2 completed
# => Thread 0 completed
# => Thread 4 completed
# => Thread 1 completed
# => Thread 3 completed
# => Results: [0, 2, 4, 6, 8]

The value method blocks until the thread completes and returns the last expression evaluated in the thread block. The join method blocks without returning a value.

Global Interpreter Lock: Ruby uses a Global Interpreter Lock (GIL), also called the Global VM Lock, that prevents multiple threads from executing Ruby code simultaneously. Only one thread can execute Ruby bytecode at any given moment, even on multi-core systems. This design simplifies interpreter implementation and ensures thread safety for internal data structures.

The GIL releases during I/O operations, allowing other threads to execute Ruby code while one thread waits for I/O. This makes threads effective for I/O-bound workloads despite the GIL:

require 'net/http'

# I/O-bound work benefits from threads despite GIL
start_time = Time.now
threads = 10.times.map do |i|
  Thread.new do
    uri = URI("https://httpbin.org/delay/1")
    response = Net::HTTP.get(uri)
    response.length
  end
end

results = threads.map(&:value)
elapsed = Time.now - start_time
puts "Fetched 10 URLs in #{elapsed.round(2)} seconds"
# => Fetched 10 URLs in 1.23 seconds (not 10 seconds)

For CPU-bound work, the GIL prevents parallelism. Multiple threads do not improve performance and may degrade it due to lock contention:

# CPU-bound work does not benefit from threads
def calculate_prime(n)
  (2..n).select do |i|
    (2..Math.sqrt(i)).none? { |d| i % d == 0 }
  end.count
end

start_time = Time.now
threads = 4.times.map do
  Thread.new { calculate_prime(50000) }
end
threads.each(&:join)
threaded_time = Time.now - start_time

start_time = Time.now
4.times { calculate_prime(50000) }
sequential_time = Time.now - start_time

puts "Threaded: #{threaded_time.round(2)}s"
puts "Sequential: #{sequential_time.round(2)}s"
# => Threaded: 8.45s
# => Sequential: 8.12s (threads slower due to GIL contention)

Process-Based Parallelism: For CPU-bound parallelism in Ruby, processes provide true parallel execution. Each process runs its own Ruby interpreter without GIL interference:

require 'parallel'

# Using processes for CPU parallelism
start_time = Time.now
results = Parallel.map([50000] * 4, in_processes: 4) do |n|
  calculate_prime(n)
end
process_time = Time.now - start_time

puts "Process-based: #{process_time.round(2)}s"
# => Process-based: 2.15s (actual parallelism on 4 cores)

Thread Synchronization: Ruby provides Mutex for mutual exclusion, ConditionVariable for thread coordination, and Queue for thread-safe data structures:

# Thread-safe counter with Mutex
class Counter
  def initialize
    @count = 0
    @mutex = Mutex.new
  end

  def increment
    @mutex.synchronize do
      current = @count
      sleep 0.001  # Simulate work
      @count = current + 1
    end
  end

  def value
    @mutex.synchronize { @count }
  end
end

counter = Counter.new
threads = 10.times.map do
  Thread.new { 100.times { counter.increment } }
end
threads.each(&:join)

puts "Final count: #{counter.value}"
# => Final count: 1000 (correct with mutex)

Without the mutex, race conditions cause incorrect results:

# Race condition without synchronization
class UnsafeCounter
  def initialize
    @count = 0
  end

  def increment
    current = @count
    sleep 0.001
    @count = current + 1
  end

  def value
    @count
  end
end

counter = UnsafeCounter.new
threads = 10.times.map do
  Thread.new { 100.times { counter.increment } }
end
threads.each(&:join)

puts "Final count: #{counter.value}"
# => Final count: 47 (incorrect due to race conditions)

Design Considerations

Workload Characteristics: I/O-bound workloads benefit from threads when operations spend significant time waiting for external resources like network responses, disk reads, or database queries. Threads allow the application to handle multiple I/O operations concurrently without blocking. The GIL releases during I/O, enabling effective concurrency.

CPU-bound workloads in Ruby require processes for parallel execution. Operations that perform extensive calculations, data transformations, or algorithmic processing do not benefit from threads due to the GIL. Each process runs independently on separate CPU cores, achieving true parallelism.

Mixed workloads require hybrid approaches. A web application handling API requests might use processes for CPU-intensive request processing while using threads within each process for concurrent database queries.

Fault Tolerance Requirements: Applications requiring strong fault isolation should use processes. Each process failure affects only that process, allowing other processes to continue serving requests. Web servers like Unicorn and Puma (in clustered mode) use process-based concurrency to isolate request failures.

Shared-fate systems where component failures should terminate the entire application can use threads. A background job processor might use threads where any thread failure indicates a serious problem requiring full application restart.

Memory Constraints: Threads consume less memory than processes. A thread requires approximately 1-2 MB for its stack, while a process requires duplicating the entire address space, typically 20-50 MB or more for Ruby applications. Systems with memory constraints or those needing many concurrent execution units favor threads.

Process-based systems trade memory for isolation and parallelism. A web server using 10 processes consumes significantly more memory than one using 10 threads, but provides better CPU utilization for mixed workloads and stronger failure isolation.

Communication Patterns: Applications with frequent inter-unit communication favor threads. Shared memory communication has minimal overhead compared to inter-process communication. A data processing pipeline where stages pass data between steps benefits from thread-based implementation.

Applications with infrequent communication or those requiring strong boundaries between units work well with processes. Message-passing architectures using queues or message brokers suit process-based designs where each process handles requests independently.

Debugging and Development: Thread-based concurrency complicates debugging due to race conditions, deadlocks, and non-deterministic execution order. Reproducing thread-related bugs requires careful instrumentation and understanding of memory models. Development cycles lengthen when dealing with thread safety issues.

Process-based systems offer simpler debugging. Each process executes independently, making bugs reproducible. State inspection examines only one process's memory space. Process crashes generate clear stack traces without affecting debugging tools.

Deployment and Scaling: Horizontal scaling differs between processes and threads. Process-based systems scale by adding more processes across multiple machines. Load balancers distribute work between processes running on different servers. This scaling model integrates naturally with containerized deployments where each container runs independent processes.

Thread-based systems scale vertically within a single machine's resources. Adding more threads increases concurrency up to the limits of available CPU cores and memory. Horizontal scaling requires running multiple thread-using processes across machines, creating a two-level scaling architecture.

Performance Considerations

Startup Latency: Thread creation latency ranges from 10-100 microseconds depending on operating system and hardware. Creating 100 threads adds approximately 1-10 milliseconds to application startup. Process creation latency ranges from 1-10 milliseconds per process. Creating 10 processes adds 10-100 milliseconds to startup.

For applications requiring rapid startup, such as command-line tools or serverless functions, these differences affect user experience. Process-heavy architectures incur noticeable startup delays. Thread-based approaches start faster but may not achieve desired parallelism for CPU-bound work.

Context Switch Overhead: Thread context switches complete in 1-5 microseconds, involving saving and restoring CPU registers and stack pointers. Process context switches require 5-20 microseconds, including TLB flushes and page table updates. Under high concurrency with frequent context switches, these differences compound.

A web server handling 10,000 requests per second with 10 workers experiences approximately 1,000 context switches per worker per second. With threads, this overhead consumes approximately 5 milliseconds per second per worker (0.5% CPU). With processes, it consumes approximately 20 milliseconds per second per worker (2% CPU).

Memory Footprint: A typical Ruby process consumes 40-100 MB of memory for application code, loaded gems, and runtime structures. Each additional process duplicates this baseline, though shared library code remains in shared memory. Ten processes consume approximately 400-1000 MB total.

Threads within a process share the base memory. Each thread adds only its stack (1-2 MB) plus thread-local storage. Ten threads in one process consume approximately 50-120 MB total, roughly one-tenth the memory of ten processes.

Memory-intensive applications amplify these differences. A Rails application loading numerous gems and caching data in memory might require 200 MB per process. A process-based deployment with 20 workers consumes 4 GB of memory for the application alone, while a thread-based approach with 20 threads consumes approximately 250 MB.

Throughput Characteristics: For I/O-bound workloads, threads and processes achieve similar throughput. A web application making database queries spends most time waiting for I/O. Both approaches allow concurrent I/O operations, achieving comparable requests-per-second rates. Threads may show slightly better throughput due to lower context switch overhead.

For CPU-bound workloads, processes achieve higher throughput when work exceeds one core's capacity. Computing image transformations or running complex algorithms benefits from parallel execution across multiple cores. Processes provide linear speedup up to the number of cores, while threads remain constrained by the GIL.

Mixed workloads show nuanced performance characteristics. A web application with both quick database lookups and occasional heavy computation achieves best throughput with a hybrid approach: multiple processes (one per core) each running multiple threads for I/O concurrency.

Scalability Limits: Thread-based systems hit scalability limits around 1,000-10,000 threads per process, depending on available memory and operating system limits. Beyond this point, context switch overhead degrades performance. The operating system's thread scheduler struggles to efficiently manage thousands of threads.

Process-based systems scale to hundreds of processes per machine before system resources become constrained. Process count typically limits to avoid memory exhaustion rather than scheduling overhead. Distributed systems scale process-based architectures horizontally across machines, achieving thousands to millions of concurrent workers.

Resource Contention: Threads sharing memory structures create contention. Multiple threads accessing the same Mutex serialize execution, creating bottlenecks. High lock contention under concurrent load can reduce throughput to levels worse than sequential execution:

# Contention example
mutex = Mutex.new
shared_data = []

threads = 20.times.map do
  Thread.new do
    1000.times do
      mutex.synchronize do
        # Short critical section but high contention
        shared_data << Thread.current.object_id
      end
    end
  end
end

start = Time.now
threads.each(&:join)
elapsed = Time.now - start
puts "Completed in #{elapsed.round(2)}s with high contention"
# => Completed in 4.23s (poor performance due to contention)

Processes avoid shared memory contention but face other resource conflicts. Multiple processes writing to the same file require file locking. Multiple processes accessing the same database connection pool contend for limited connections.

Common Pitfalls

Race Conditions in Shared State: Thread-based systems frequently encounter race conditions where multiple threads access shared state without proper synchronization. The race condition occurs when execution order determines program correctness, and that order is non-deterministic:

# Race condition in shared counter
class TaskQueue
  def initialize
    @queue = []
    @processed = 0
  end

  def add_task(task)
    @queue << task
  end

  def process_task
    if @queue.any?
      task = @queue.shift
      perform_work(task)
      @processed += 1  # Race condition here
    end
  end

  def perform_work(task)
    sleep 0.01  # Simulate work
  end

  def stats
    { queue_size: @queue.size, processed: @processed }
  end
end

queue = TaskQueue.new
100.times { |i| queue.add_task(i) }

threads = 10.times.map do
  Thread.new do
    10.times { queue.process_task }
  end
end
threads.each(&:join)

puts queue.stats
# => {queue_size: 0, processed: 87} (should be 100)

The increment operation @processed += 1 translates to read-modify-write at the machine level. Two threads can read the same value, increment it, and both write back the incremented value, losing one increment.

Deadlock in Resource Acquisition: Threads acquiring multiple locks in different orders create deadlock potential. Thread A holds Lock 1 and waits for Lock 2 while Thread B holds Lock 2 and waits for Lock 1. Both threads block indefinitely:

# Deadlock scenario
mutex_a = Mutex.new
mutex_b = Mutex.new

thread1 = Thread.new do
  mutex_a.synchronize do
    sleep 0.1
    puts "Thread 1 trying to get B"
    mutex_b.synchronize do
      puts "Thread 1 has both"
    end
  end
end

thread2 = Thread.new do
  mutex_b.synchronize do
    sleep 0.1
    puts "Thread 2 trying to get A"
    mutex_a.synchronize do
      puts "Thread 2 has both"
    end
  end
end

thread1.join
thread2.join
# => Thread 1 trying to get B
# => Thread 2 trying to get A
# => (hangs forever in deadlock)

Always acquire locks in a consistent order across all threads to prevent deadlock.

Fork Safety Issues: Forking a multi-threaded process creates subtle problems. The child process inherits only the calling thread. Other threads disappear in the child, but mutexes and other synchronization primitives remain in their current state. If a non-existent thread held a mutex when the fork occurred, that mutex remains locked forever in the child:

# Fork safety problem
mutex = Mutex.new
data = []

thread = Thread.new do
  loop do
    mutex.synchronize do
      data << Time.now
      sleep 0.1
    end
  end
end

sleep 0.5  # Let thread run

pid = fork do
  # Mutex might be locked from parent thread that no longer exists
  mutex.synchronize do  # May hang if parent thread held lock
    puts "Child accessing data: #{data.size}"
  end
end

Process.wait(pid)
thread.kill

Fork only single-threaded processes or reinitialize all synchronization primitives after forking. Better yet, avoid mixing threads and processes.

Memory Leaks in Long-Running Threads: Threads holding references to objects prevent garbage collection. A thread maintaining a local variable referencing a large data structure keeps that memory allocated even when no other code uses it:

# Memory leak in thread
def process_data_threaded
  threads = 10.times.map do
    Thread.new do
      large_data = Array.new(1_000_000) { rand }  # 8 MB array
      loop do
        # Thread runs forever, large_data never freed
        process_item(large_data.sample)
        sleep 1
      end
    end
  end
  # Threads never joined, continue holding memory
end

# Leaks 80 MB that remains allocated until process terminates
process_data_threaded

Explicitly join or kill threads when they complete their work. Avoid long-running threads holding large data structures. Use thread pools with finite thread lifetimes.

Signal Handling Complications: Process signals deliver to random threads in multi-threaded processes. A SIGTERM might interrupt any thread, potentially while holding locks or in the middle of critical operations. Handling signals safely in multi-threaded programs requires directing signals to specific threads:

# Unsafe signal handling
trap('INT') do
  puts "Interrupted!"
  exit
end

threads = 5.times.map do
  Thread.new do
    loop { perform_work }
  end
end

threads.each(&:join)
# Ctrl-C might interrupt any thread, possibly mid-operation

Single-threaded processes or processes with signal-handling threads avoid this complexity. Signals deliver predictably to the process, which can handle them safely.

Process Zombie Accumulation: Forking processes without waiting for them creates zombie processes. Zombies remain in the process table, consuming process IDs until the parent waits on them. Creating many zombies can exhaust available process IDs:

# Zombie creation
1000.times do
  fork do
    sleep 0.1
    exit
  end
end

# Parent continues without waiting
# 1000 zombie processes accumulate
sleep 10

# System process table fills with zombies
# New process creation fails

Always wait for child processes using Process.wait or Process.detach for fire-and-forget children. Set up signal handlers to reap zombies asynchronously.

Practical Examples

Web Server Concurrency Models: A web server handling HTTP requests demonstrates practical process versus thread trade-offs. A simple process-per-request model forks a new process for each connection:

require 'socket'

# Process-per-request server
server = TCPServer.new(8080)
puts "Server listening on port 8080"

loop do
  client = server.accept
  
  fork do
    request = client.gets
    puts "Process #{Process.pid} handling request"
    
    response = "HTTP/1.1 200 OK\r\n"
    response += "Content-Type: text/plain\r\n"
    response += "\r\n"
    response += "Handled by process #{Process.pid}\n"
    
    client.puts response
    client.close
  end
  
  Process.detach(fork)  # Prevent zombies
end

This model provides strong isolation but consumes excessive resources under high load. A thread-per-request model reduces overhead:

# Thread-per-request server
server = TCPServer.new(8080)
puts "Server listening on port 8080"

loop do
  client = server.accept
  
  Thread.new(client) do |conn|
    request = conn.gets
    puts "Thread #{Thread.current.object_id} handling request"
    
    response = "HTTP/1.1 200 OK\r\n"
    response += "Content-Type: text/plain\r\n"
    response += "\r\n"
    response += "Handled by thread #{Thread.current.object_id}\n"
    
    conn.puts response
    conn.close
  end
end

Production servers use hybrid approaches: a fixed pool of processes, each running multiple threads. This balances resource efficiency with fault isolation and CPU parallelism.

Parallel Data Processing: Processing large datasets benefits from parallel execution. A CSV processing pipeline demonstrates the trade-offs:

require 'csv'

# Sequential processing baseline
def process_csv_sequential(filename)
  results = []
  CSV.foreach(filename, headers: true) do |row|
    results << expensive_transformation(row)
  end
  results
end

# Thread-based parallel processing (limited by GIL)
def process_csv_threaded(filename, thread_count: 4)
  rows = CSV.read(filename, headers: true)
  chunks = rows.each_slice((rows.size / thread_count.to_f).ceil).to_a
  
  threads = chunks.map do |chunk|
    Thread.new do
      chunk.map { |row| expensive_transformation(row) }
    end
  end
  
  threads.flat_map(&:value)
end

# Process-based parallel processing (true parallelism)
def process_csv_processes(filename, process_count: 4)
  rows = CSV.read(filename, headers: true)
  chunk_size = (rows.size / process_count.to_f).ceil
  
  # Write chunks to temporary files
  chunk_files = process_count.times.map do |i|
    chunk = rows[i * chunk_size, chunk_size] || []
    file = "/tmp/chunk_#{i}.csv"
    CSV.open(file, 'w') do |csv|
      csv << rows.headers
      chunk.each { |row| csv << row }
    end
    file
  end
  
  # Process chunks in parallel
  pids = chunk_files.map do |file|
    fork do
      results = CSV.read(file, headers: true).map do |row|
        expensive_transformation(row)
      end
      # Write results to pipe or file
      puts results.to_json
    end
  end
  
  pids.each { |pid| Process.wait(pid) }
end

def expensive_transformation(row)
  # CPU-intensive transformation
  (1..1000).inject(:*) % 12345
  row.to_h.transform_values(&:upcase)
end

The threaded version sees minimal speedup due to the GIL, while the process version achieves near-linear speedup with core count.

Background Job Processing: A background job system demonstrates how workload characteristics influence process versus thread choice:

# Thread-based worker for I/O-bound jobs
class ThreadedJobWorker
  def initialize(thread_count: 10)
    @thread_count = thread_count
    @queue = Queue.new
    @threads = []
  end

  def start
    @thread_count.times do
      @threads << Thread.new do
        loop do
          job = @queue.pop
          break if job == :shutdown
          execute_job(job)
        end
      end
    end
  end

  def enqueue(job)
    @queue.push(job)
  end

  def shutdown
    @thread_count.times { @queue.push(:shutdown) }
    @threads.each(&:join)
  end

  private

  def execute_job(job)
    # I/O-bound job: API calls, database queries
    case job[:type]
    when :api_call
      make_api_request(job[:url])
    when :email
      send_email(job[:recipient], job[:body])
    when :database
      update_database(job[:query])
    end
  rescue => e
    log_error(job, e)
  end
end

# Process-based worker for CPU-bound jobs
class ProcessWorker
  def initialize(process_count: 4)
    @process_count = process_count
    @queue = []
    @queue_file = '/tmp/job_queue.json'
  end

  def start
    @process_count.times do |i|
      fork do
        worker_loop(i)
      end
    end
    
    # Parent waits for all workers
    @process_count.times { Process.wait }
  end

  def enqueue(job)
    @queue << job
    File.write(@queue_file, @queue.to_json)
  end

  private

  def worker_loop(worker_id)
    loop do
      jobs = JSON.parse(File.read(@queue_file)) rescue []
      break if jobs.empty?
      
      job = jobs.shift
      File.write(@queue_file, jobs.to_json)
      
      execute_cpu_job(job)
    end
  end

  def execute_cpu_job(job)
    # CPU-bound job: image processing, report generation
    case job['type']
    when 'image_resize'
      resize_image(job['image_path'])
    when 'report_generation'
      generate_complex_report(job['data'])
    when 'video_encoding'
      encode_video(job['video_path'])
    end
  end
end

The threaded worker handles I/O-bound jobs efficiently, while the process worker provides true parallelism for CPU-intensive operations.

Reference

Process Operations

Operation	Ruby API	Description	Use Case
Create process	Process.fork	Duplicates current process	Parallel execution with isolation
Spawn program	Process.spawn	Executes new program in child	Running external commands
Wait for child	Process.wait	Blocks until child exits	Synchronizing with child completion
Wait with status	Process.wait2	Returns PID and exit status	Checking child success/failure
Detach process	Process.detach	Prevents zombie accumulation	Fire-and-forget child processes
Kill process	Process.kill	Sends signal to process	Terminating child processes
Current PID	Process.pid	Returns current process ID	Logging and debugging
Parent PID	Process.ppid	Returns parent process ID	Process hierarchy tracking

Thread Operations

Operation	Ruby API	Description	Use Case
Create thread	Thread.new	Starts new thread	Concurrent execution
Wait for thread	Thread#join	Blocks until thread completes	Synchronizing thread completion
Get result	Thread#value	Returns thread return value	Collecting computation results
Kill thread	Thread#kill	Terminates thread immediately	Canceling operations
Current thread	Thread.current	Returns current thread object	Thread-local operations
Main thread	Thread.main	Returns main program thread	Identifying main execution context
List threads	Thread.list	Returns all living threads	Debugging thread leaks
Thread status	Thread#status	Returns run/sleep/aborting/false/nil	Monitoring thread state

Synchronization Primitives

Primitive	Ruby API	Purpose	Typical Use
Mutual exclusion	Mutex	Serializes access to shared state	Protecting critical sections
Condition variable	ConditionVariable	Coordinates thread waiting	Producer-consumer patterns
Thread-safe queue	Queue	Manages work distribution	Job queues and pipelines
Sized queue	SizedQueue	Queue with maximum size	Backpressure and flow control
Read-write lock	Not in stdlib	Allows multiple readers	Shared read-heavy data
Semaphore	Not in stdlib	Limits concurrent access	Resource pool management

Performance Characteristics

Metric	Threads	Processes	Impact
Creation time	10-100 μs	1-10 ms	Startup latency
Context switch	1-5 μs	5-20 μs	Throughput under load
Memory per unit	1-2 MB	40-100 MB	System capacity
Communication	Shared memory (ns)	IPC (10-100 μs)	Data transfer overhead
Fault isolation	None (shared fate)	Complete	System reliability
CPU parallelism	No (GIL)	Yes	CPU-bound performance
I/O parallelism	Yes	Yes	I/O-bound performance

Common Patterns

Pattern	Implementation	When to Use
Thread pool	Fixed thread count processing queue	Bounded concurrency for I/O
Process pool	Fixed process count with task distribution	CPU parallelism for compute work
Fork-join	Fork workers, wait for all	Parallel divide-and-conquer
Pipeline	Threads passing data through stages	Multi-stage data transformation
Producer-consumer	Queue between producer and consumer threads	Decoupled work generation and execution
Worker pool	Pre-forked processes accepting connections	Web server request handling

Decision Matrix

Requirement	Recommended Choice	Rationale
CPU-bound work	Processes	True parallelism without GIL
I/O-bound work	Threads	Efficient concurrency with less overhead
Strong isolation	Processes	Failure containment
Low memory usage	Threads	Shared address space
Fast startup	Threads	Minimal creation overhead
Simple debugging	Processes	Independent execution
Frequent communication	Threads	Shared memory access
Horizontal scaling	Processes	Natural distribution model

Process vs Thread