Overview
Context switching represents the mechanism by which an operating system stores and restores the execution state of a process or thread, enabling multiple execution contexts to share processor resources. When the OS scheduler decides to suspend one execution context and resume another, it performs a context switch that involves saving the complete state of the currently running context and loading the previously saved state of the incoming context.
The process occurs transparently to running programs but carries measurable performance costs. Modern operating systems perform thousands of context switches per second across system processes, application threads, and kernel operations. Understanding context switching mechanics proves essential for optimizing concurrent applications, diagnosing performance bottlenecks, and making informed decisions about concurrency models.
Context switching occurs at multiple levels: between processes managed by the OS scheduler, between threads within a single process, and between lighter-weight execution contexts like coroutines or fibers. Each level involves different state preservation requirements and performance characteristics. Process context switches require complete memory mapping changes and kernel involvement, while thread switches within the same process share memory space but still require register and stack state management.
# Observing context switching through thread behavior
require 'benchmark'
# Single-threaded execution
single_thread_time = Benchmark.measure do
10_000.times { Math.sqrt(rand(1000)) }
end
# Multi-threaded execution that forces context switches
multi_thread_time = Benchmark.measure do
threads = 4.times.map do
Thread.new { 2_500.times { Math.sqrt(rand(1000)) } }
end
threads.each(&:join)
end
puts "Single thread: #{single_thread_time.real}"
puts "Multiple threads: #{multi_thread_time.real}"
# Multi-threaded may be slower due to context switching overhead
Key Principles
A context represents the complete execution state required to resume a suspended computation. For processes, this includes the program counter (instruction pointer), processor registers, stack pointer, memory mappings, open file descriptors, signal handlers, and process ownership information. For threads, the context includes thread-specific registers, stack pointer, and thread-local storage, but shares the process memory space and file descriptors with other threads in the same process.
The operating system scheduler triggers context switches based on scheduling policies. Preemptive multitasking forces involuntary context switches when a time slice expires or a higher-priority task becomes ready. Voluntary context switches occur when a process blocks on I/O, explicitly yields the processor, or terminates. The scheduler selects the next context to run based on priority, fairness algorithms (like Completely Fair Scheduler in Linux), or real-time requirements.
Context switch mechanics proceed through distinct phases. First, the kernel intercepts control, triggered by a timer interrupt, system call, or hardware event. Second, the kernel saves the current context state to a process control block or thread control block data structure in kernel memory. Third, the scheduler algorithm selects the next context to execute. Fourth, the kernel loads the saved state from the selected context's control block into processor registers and memory management unit. Finally, control transfers to the restored context, which resumes execution from its saved program counter.
The cost of context switching stems from multiple factors. Direct costs include the CPU cycles required to save and restore register state, update memory management structures, and execute scheduler code. Indirect costs prove more significant: cache invalidation forces memory accesses to slower levels of the memory hierarchy, TLB (Translation Lookaside Buffer) flushes require virtual-to-physical address remapping, and pipeline stalls waste instruction-level parallelism. Process switches incur additional overhead from changing memory page tables and flushing processor caches entirely.
Thread context switches within a process cost less than full process switches because threads share the same address space, avoiding TLB flushes and page table updates. However, thread switches still invalidate CPU caches with the new thread's data and instruction streams. Cooperative multitasking systems like fiber schedulers eliminate involuntary switches, reducing overhead further by switching only at well-defined points where minimal state requires preservation.
# Demonstrating the context components in Ruby
class ExecutionContext
attr_reader :stack, :program_counter, :local_variables
def initialize
@stack = []
@program_counter = 0
@local_variables = {}
end
def save_state
{
stack: @stack.dup,
pc: @program_counter,
locals: @local_variables.dup
}
end
def restore_state(state)
@stack = state[:stack]
@program_counter = state[:pc]
@local_variables = state[:locals]
end
end
# Simplified fiber-like behavior showing state preservation
context1 = ExecutionContext.new
context1.stack.push(1, 2, 3)
context1.local_variables[:result] = 42
saved = context1.save_state
context1.stack.clear
context1.restore_state(saved)
puts context1.stack.inspect # => [1, 2, 3]
puts context1.local_variables[:result] # => 42
Operating systems employ various strategies to minimize context switch overhead. Modern processors provide hardware support through special instructions for rapid state saving and restoration. The scheduler batches operations to amortize fixed costs across multiple switches. Cache-aware scheduling attempts to keep threads on the same processor core to preserve cache warmth. Voluntary context switches at I/O boundaries prove cheaper than preemptive switches because the application state aligns with a safe switching point.
Implementation Approaches
Process-based concurrency creates separate address spaces for each execution context. When the application forks a new process, the OS duplicates the parent's memory space and assigns a unique process ID. Process switches require complete memory mapping changes, making them expensive but providing strong isolation. Processes communicate through inter-process communication mechanisms like pipes, sockets, or shared memory regions. This approach suits applications requiring fault isolation, security boundaries, or true parallel execution across multiple CPU cores without shared state concerns.
Thread-based concurrency shares a single address space among multiple execution contexts within a process. Threads access the same heap memory, global variables, and open file descriptors while maintaining separate stacks and register sets. Thread creation and context switching cost less than process operations because memory mappings remain unchanged. However, threads require explicit synchronization primitives (mutexes, semaphores, condition variables) to prevent race conditions on shared data. This model fits applications with shared state, frequent inter-thread communication, or fine-grained parallelism needs.
Fiber-based concurrency (cooperative multitasking) implements user-space scheduling where execution contexts explicitly yield control rather than being preempted. Fibers eliminate timer interrupts and scheduler overhead, switching only at designated yield points. The programmer controls when context switches occur, avoiding the need for locks around atomic operations. Fibers cannot preempt long-running computations and cannot achieve true parallelism on multiple cores, but they enable efficient I/O multiplexing and structured concurrency patterns. This approach excels for I/O-bound applications, coroutine-based control flow, or scenarios where predictable scheduling matters.
Event-driven architectures with callbacks avoid context switching entirely by maintaining a single execution context that processes events from a queue. The event loop dispatches handlers for I/O completion, timer expiration, or user actions without switching contexts. This model eliminates context switch overhead but requires non-blocking operations and complicates control flow with callback chains or promise patterns. Event-driven systems handle thousands of concurrent connections efficiently but struggle with CPU-intensive operations that block the event loop.
Hybrid approaches combine multiple models. Many applications use processes for isolation with threads for parallelism within each process. Ruby's threading model adds a Global VM Lock that serializes Ruby bytecode execution while still allowing I/O operations to release the lock, creating a hybrid between true parallel threads and single-threaded event processing. Modern async/await patterns implement cooperative scheduling on top of thread pools, gaining cooperative efficiency for I/O while maintaining parallel execution capability for CPU work.
# Process-based approach
pid = fork do
puts "Child process: #{Process.pid}"
sleep 1
exit 0
end
puts "Parent process: #{Process.pid}"
Process.wait(pid)
# Thread-based approach
thread = Thread.new do
puts "Thread: #{Thread.current.object_id}"
sleep 1
end
puts "Main: #{Thread.main.object_id}"
thread.join
# Fiber-based approach
fiber = Fiber.new do
puts "Fiber starts"
Fiber.yield
puts "Fiber resumes"
end
fiber.resume
puts "Main continues"
fiber.resume
Actor model implementations encapsulate state within actors that communicate through message passing. Each actor processes messages sequentially from a mailbox queue, eliminating shared mutable state and race conditions. Context switches occur between actors rather than within them. This model maps naturally to distributed systems where actors run on different machines. The approach trades context switching overhead for message passing overhead while gaining simpler reasoning about concurrent state.
Ruby Implementation
Ruby implements multiple concurrency models with different context switching characteristics. The core runtime includes native threads managed by the operating system, lightweight fibers for cooperative multitasking, and process forking for complete isolation. Ruby's Global VM Lock (GVL), also called Global Interpreter Lock, serializes execution of Ruby bytecode across threads, fundamentally changing the context switching behavior compared to traditional threading models.
Native threads in Ruby create actual OS threads through the pthread library on Unix systems or Windows threads on Windows. Each Ruby thread corresponds to an OS-level thread that the kernel scheduler manages. However, the GVL prevents multiple Ruby threads from executing Ruby code simultaneously. When a thread wants to execute Ruby bytecode, it must acquire the GVL. If another thread holds the lock, the requesting thread blocks, triggering an OS context switch to another runnable thread. The GVL holder releases the lock periodically or when performing I/O operations, allowing context switches to other waiting Ruby threads.
require 'thread'
# GVL demonstration: threads compete for the lock
start_time = Time.now
mutex = Mutex.new
counter = 0
threads = 10.times.map do |i|
Thread.new do
1000.times do
mutex.synchronize { counter += 1 }
# Each synchronize point can trigger context switch
end
end
end
threads.each(&:join)
elapsed = Time.now - start_time
puts "Counter: #{counter}"
puts "Time: #{elapsed}s"
puts "Context switches occurred during mutex contention"
The GVL design prevents data races within Ruby's internal structures and simplifies C extension development, but limits CPU-bound parallelism. CPU-intensive Ruby code running on multiple threads still executes serially, switching contexts between threads but never running truly parallel. I/O operations release the GVL, permitting other threads to execute during I/O waits, making Ruby threading effective for I/O-bound concurrency despite the lock.
Fibers provide cooperative multitasking where context switches occur only at explicit yield points. Creating a fiber allocates a stack and establishes an execution context, but fibers don't run until explicitly resumed. The Fiber.yield method saves the current fiber's context and returns control to the caller. The Fiber#resume method switches context back to the fiber. Fibers don't interact with the OS scheduler and don't require GVL acquisition because only one fiber runs at any time.
# Fiber context switching for producer-consumer pattern
def produce(items)
Fiber.new do
items.each do |item|
puts "Producing: #{item}"
Fiber.yield item # Context switch to consumer
end
nil
end
end
producer = produce([1, 2, 3, 4, 5])
while (item = producer.resume) # Context switch to producer
puts "Consuming: #{item}"
sleep 0.1 # Simulate work
end
# Output shows alternating producer/consumer with explicit switches
Process forking creates complete OS-level processes through the fork system call. The child process receives a copy-on-write duplicate of the parent's memory space, file descriptors, and process state. Forked processes run independently with separate address spaces, requiring IPC mechanisms for communication. Ruby's Process.fork triggers full process context switches when the OS scheduler switches between parent and child processes. Each process maintains its own GVL, enabling true parallel execution on multiple cores.
# Process-based parallelism with IPC
read_pipe, write_pipe = IO.pipe
pid = fork do
read_pipe.close
result = (1..1_000_000).reduce(:+)
Marshal.dump(result, write_pipe)
write_pipe.close
end
write_pipe.close
result = Marshal.load(read_pipe)
read_pipe.close
Process.wait(pid)
puts "Child computed: #{result}"
# Each process runs on potentially different cores
# Context switching happens between processes
Ruby's Thread#priority attribute influences OS scheduler decisions about which thread to run after a context switch. Higher priority threads receive preference when multiple threads compete for CPU time. Setting priority affects context switch frequency by changing how often the scheduler selects particular threads, though the GVL still serializes Ruby bytecode execution.
The Thread.pass method explicitly yields the processor, requesting a context switch to another thread. This voluntary yield allows cooperative scheduling patterns within Ruby's preemptive threading model. Calling Thread.pass inside tight loops can reduce contention by giving other threads opportunities to acquire the GVL.
# Voluntary context switching with Thread.pass
producer_done = false
queue = []
mutex = Mutex.new
producer = Thread.new do
5.times do |i|
mutex.synchronize do
queue << i
puts "Produced: #{i}"
end
Thread.pass # Explicit yield to let consumer run
end
producer_done = true
end
consumer = Thread.new do
until producer_done && queue.empty?
item = mutex.synchronize { queue.shift }
if item
puts "Consumed: #{item}"
else
Thread.pass # Yield if queue empty
end
end
end
[producer, consumer].each(&:join)
Ruby 3.0 introduced Ractor for actor-based parallelism without GVL constraints. Each Ractor runs on a separate thread with its own GVL, enabling true parallel execution of Ruby code. Ractors communicate through message passing and cannot share mutable objects. Context switches occur both at the OS thread level between ractors and within each ractor's thread for normal Ruby threading. Ractors trade shared memory convenience for parallel execution capability.
Performance Considerations
Context switch frequency directly impacts application throughput. Each switch consumes CPU cycles for state preservation, scheduler execution, and state restoration. Applications performing thousands of switches per second spend measurable time in kernel context switching code rather than application logic. High-frequency switching compounds with cache effects, where each context switch invalidates cached data from the previous context, forcing memory accesses to slower cache levels or main memory.
Thread count affects context switching overhead non-linearly. With more threads than CPU cores, the OS must time-slice cores among threads, increasing switch frequency. The optimal thread count for I/O-bound applications often exceeds core count significantly because blocked threads don't consume CPU. For CPU-bound applications, thread counts matching or slightly exceeding core count minimize switching while maintaining core utilization. Excessive threads create contention for the GVL in Ruby, serializing execution and maximizing context switch overhead without parallelism gains.
require 'benchmark'
def compute_intensive_task
1_000.times { Math.sqrt(rand(10000)) }
end
# Benchmark context switching overhead with varying thread counts
[1, 2, 4, 8, 16, 32].each do |thread_count|
time = Benchmark.measure do
threads = thread_count.times.map do
Thread.new { 1000.times { compute_intensive_task } }
end
threads.each(&:join)
end
puts "#{thread_count} threads: #{time.real.round(3)}s"
end
# Output shows performance degrading as thread count increases
# beyond CPU core count due to context switching overhead
Voluntary context switches at I/O operations cost less than involuntary timer-based preemption. When a thread blocks on I/O, it enters a wait state where the kernel can efficiently switch to another ready thread without preserving as much CPU state. Preemptive switches triggered by timer interrupts occur at arbitrary instruction boundaries, requiring full register state preservation and cache line writebacks. Ruby's GVL release during I/O operations takes advantage of voluntary switching economics.
Lock contention creates context switch storms. When multiple threads compete for a mutex, losers block and trigger context switches. The winner executes its critical section briefly, releases the lock, and another thread acquires it, often immediately blocking again. This pattern generates excessive switching with minimal productive work. Fine-grained locking increases the contention surface area, while coarse-grained locking reduces parallelism. The optimal lock granularity balances these competing concerns.
require 'benchmark'
# Demonstrating lock contention impact
shared_data = { counter: 0 }
mutex = Mutex.new
# Fine-grained locking with high contention
contended_time = Benchmark.measure do
threads = 8.times.map do
Thread.new do
1000.times do
mutex.synchronize do
shared_data[:counter] += 1
# Tiny critical section causes rapid lock cycling
# Many context switches as threads compete
end
end
end
end
threads.each(&:join)
end
shared_data[:counter] = 0
# Batched updates reduce lock acquisitions
batched_time = Benchmark.measure do
threads = 8.times.map do
Thread.new do
local_sum = 0
1000.times { local_sum += 1 }
mutex.synchronize do
shared_data[:counter] += local_sum
# Single lock acquisition per thread
end
end
end
threads.each(&:join)
end
puts "Contended: #{contended_time.real.round(3)}s"
puts "Batched: #{batched_time.real.round(3)}s"
# Batched approach shows significant speedup
Measuring context switch rates helps identify performance issues. Linux provides context switch statistics through /proc/[pid]/status showing voluntary and involuntary switches. High involuntary switch counts indicate CPU-bound threads competing for time slices. High voluntary switch counts suggest I/O waiting or lock contention. Ruby applications can sample these metrics to correlate switch rates with performance degradation.
Affinity settings reduce context switch overhead by binding threads to specific CPU cores. When a thread remains on one core, the CPU caches stay warm with the thread's data and instructions. Linux's taskset command or Ruby's Fiddle library to call sched_setaffinity can establish CPU affinity. This optimization matters most for CPU-bound threads where cache performance dominates. I/O-bound threads benefit less because they frequently block and surrender CPU time anyway.
# Monitoring context switches in Ruby processes
def read_context_switches(pid = Process.pid)
status_path = "/proc/#{pid}/status"
return unless File.exist?(status_path)
content = File.read(status_path)
voluntary = content[/voluntary_ctxt_switches:\s+(\d+)/, 1].to_i
involuntary = content[/nonvoluntary_ctxt_switches:\s+(\d+)/, 1].to_i
{ voluntary: voluntary, involuntary: involuntary }
end
# Sample at intervals to measure rate
start_switches = read_context_switches
start_time = Time.now
# Perform work
threads = 4.times.map do
Thread.new { 10_000.times { Math.sqrt(rand) } }
end
threads.each(&:join)
end_switches = read_context_switches
elapsed = Time.now - start_time
if start_switches && end_switches
vol_rate = (end_switches[:voluntary] - start_switches[:voluntary]) / elapsed
invol_rate = (end_switches[:involuntary] - start_switches[:involuntary]) / elapsed
puts "Voluntary switches/sec: #{vol_rate.round(2)}"
puts "Involuntary switches/sec: #{invol_rate.round(2)}"
end
Fiber-based architectures minimize context switching costs by eliminating kernel involvement. User-space context switches execute orders of magnitude faster than kernel-mediated thread switches. Applications handling many concurrent I/O operations, like web servers, benefit substantially from fiber architectures. However, fibers cannot preempt CPU-bound operations, requiring explicit yields. Long-running computations without yield points monopolize the CPU, starving other fibers.
Practical Examples
A web server demonstrates context switching across request handling. Traditional threaded servers create a thread per connection, relying on OS context switching when threads block on socket I/O. Under load with thousands of concurrent connections, excessive threads create context switch overhead. Event-driven or fiber-based servers maintain fewer OS threads, using cooperative switching to multiplex many connections onto each thread.
require 'socket'
require 'fiber'
# Thread-per-connection server (high context switching)
def threaded_server(port)
server = TCPServer.new(port)
loop do
client = server.accept
Thread.new(client) do |conn|
# OS context switch on each accept and read
request = conn.gets
conn.puts "HTTP/1.1 200 OK\r\n\r\nReceived"
conn.close
end
end
end
# Fiber-based server (reduced context switching)
def fiber_server(port)
server = TCPServer.new(port)
fibers = []
# Accept loop
acceptor = Fiber.new do
loop do
client = server.accept
handler = Fiber.new do
request = client.gets
client.puts "HTTP/1.1 200 OK\r\n\r\nReceived"
client.close
end
handler.resume
Fiber.yield
end
end
loop { acceptor.resume }
end
# The fiber version reduces OS context switches
# by handling multiple connections in one thread
Database connection pooling illustrates context switching in resource management. Each thread needing database access must acquire a connection from the pool. When connections are exhausted, threads block and context switch to other threads. Proper pool sizing balances connection overhead against context switch frequency. Too few connections cause excessive blocking and switching. Too many connections waste memory and database resources.
require 'thread'
class ConnectionPool
def initialize(size)
@pool = Queue.new
@mutex = Mutex.new
@resource_count = 0
@size = size
size.times do
@pool << create_connection
end
end
def with_connection
conn = @pool.pop # Blocks if pool empty, triggers context switch
begin
yield conn
ensure
@pool << conn
end
end
private
def create_connection
@mutex.synchronize do
@resource_count += 1
"Connection-#{@resource_count}"
end
end
end
pool = ConnectionPool.new(5)
# Simulate 10 threads competing for 5 connections
threads = 10.times.map do |i|
Thread.new do
pool.with_connection do |conn|
puts "Thread #{i} acquired #{conn}"
sleep 0.1 # Simulate query
# Context switches to waiting threads when connection released
end
end
end
threads.each(&:join)
Producer-consumer patterns showcase context switching for work distribution. Multiple producer threads generate work items while consumer threads process them. When the queue fills, producers block and switch context. When empty, consumers block and switch. Optimal queue sizing and thread counts minimize switching while maintaining throughput.
require 'thread'
class WorkQueue
def initialize(max_size)
@queue = Queue.new
@max_size = max_size
@mutex = Mutex.new
@cond_not_full = ConditionVariable.new
@cond_not_empty = ConditionVariable.new
end
def push(item)
@mutex.synchronize do
while @queue.size >= @max_size
@cond_not_full.wait(@mutex) # Context switch here
end
@queue << item
@cond_not_empty.signal
end
end
def pop
@mutex.synchronize do
while @queue.empty?
@cond_not_empty.wait(@mutex) # Context switch here
end
item = @queue.pop
@cond_not_full.signal
item
end
end
end
work_queue = WorkQueue.new(10)
done = false
producers = 3.times.map do |i|
Thread.new do
10.times do |j|
work_queue.push("Item-#{i}-#{j}")
puts "Produced: Item-#{i}-#{j}"
end
end
end
consumers = 2.times.map do
Thread.new do
loop do
item = work_queue.pop
break if item.nil?
puts "Consumed: #{item}"
sleep 0.01 # Simulate processing
end
end
end
producers.each(&:join)
consumers.size.times { work_queue.push(nil) }
consumers.each(&:join)
Parallel computation with result aggregation demonstrates context switching in CPU-bound scenarios. Forked processes execute computations truly in parallel, with context switches between processes managed by the OS scheduler. Process-based parallelism avoids GVL limitations but requires IPC for result collection.
require 'benchmark'
def parallel_sum_processes(ranges)
results = []
pipes = ranges.map { IO.pipe }
pids = ranges.each_with_index.map do |range, i|
fork do
pipes.each { |r, w| r.close unless i == pipes.index([r, w]) }
pipes[i][0].close
sum = range.reduce(:+)
Marshal.dump(sum, pipes[i][1])
pipes[i][1].close
end
end
pipes.each { |r, w| w.close }
results = pipes.map do |r, w|
result = Marshal.load(r)
r.close
result
end
pids.each { |pid| Process.wait(pid) }
results.reduce(:+)
end
ranges = [
(1..1_000_000),
(1_000_001..2_000_000),
(2_000_001..3_000_000),
(3_000_001..4_000_000)
]
time = Benchmark.measure do
total = parallel_sum_processes(ranges)
puts "Sum: #{total}"
end
puts "Parallel processes: #{time.real.round(3)}s"
# True parallelism with process context switching
Common Pitfalls
Spawning excessive threads creates context switch thrashing. Applications that create threads for every operation generate far more threads than CPU cores. The OS spends more time switching contexts than executing application code. Each switch involves kernel overhead, cache invalidation, and TLB flushes. The solution involves thread pooling, where a fixed number of threads process work items from a queue, maintaining thread count proportional to CPU cores.
# Problematic: thread per item
def process_items_bad(items)
threads = items.map do |item|
Thread.new { process(item) }
end
threads.each(&:join)
end
# Better: thread pool
require 'thread'
def process_items_good(items)
queue = Queue.new
items.each { |item| queue << item }
workers = 4.times.map do
Thread.new do
while (item = queue.pop(true) rescue nil)
process(item)
end
end
end
workers.each(&:join)
end
Holding locks across context switches multiplies contention. When a thread acquires a lock and then blocks on I/O or sleeps, other threads waiting for that lock must context switch repeatedly, checking lock availability. Each check involves context switch overhead without progress. Releasing locks before blocking operations allows other threads to make progress without unnecessary switching.
Misunderstanding Ruby's GVL leads to false parallelism expectations. Developers create multiple threads for CPU-bound work, expecting parallel execution, but the GVL serializes execution. All threads compete for the single lock, creating maximum context switch overhead with zero parallelism benefit. CPU-bound parallelism requires forked processes or Ractor-based approaches that provide separate GVLs.
# This won't parallelize CPU-bound work
threads = 8.times.map do
Thread.new do
result = 0
1_000_000.times { result += Math.sqrt(rand) }
result
end
end
results = threads.map(&:value)
# All threads ran serially due to GVL
# High context switching overhead without parallelism benefit
Neglecting voluntary yield points in fibers causes starvation. Fibers require explicit yields for scheduling. A fiber with CPU-intensive code that never yields monopolizes execution, preventing other fibers from running. Other fibers starve until the monopolizing fiber completes or yields. Regular yield points or time-based yielding prevents starvation.
Spinning on lock acquisition wastes CPU and increases switching. Busy-wait loops that repeatedly check lock availability burn CPU cycles and force unnecessary context switches as the OS attempts to schedule the spinning thread. Using blocking lock primitives with condition variables allows threads to sleep when locks are unavailable, eliminating spin overhead and reducing context switches.
# Bad: busy waiting
@lock = false
thread1 = Thread.new do
while @lock
# Spinning consumes CPU and forces context switches
Thread.pass
end
@lock = true
# Critical section
@lock = false
end
# Good: blocking primitive
mutex = Mutex.new
thread2 = Thread.new do
mutex.synchronize do
# Blocks efficiently without spinning
# OS knows thread is waiting, reduces context switches
end
end
Ignoring context switch costs in benchmark results misleads optimization efforts. Microbenchmarks that time operations in tight loops may show different performance than production code with realistic context switching. Production applications interleave I/O, synchronization, and computation, incurring real context switch overhead. Benchmarks should reflect realistic concurrency patterns to measure true performance including switching costs.
Reference
Context Switch Types
| Type | Scope | State Saved | Cost | Use Case |
|---|---|---|---|---|
| Process | Entire process | Full CPU state, memory mappings, file descriptors | High | Isolation, true parallelism |
| Thread | Single thread | Registers, stack pointer, thread-local storage | Medium | Shared memory concurrency |
| Fiber | Execution point | Minimal stack frame | Low | Cooperative multitasking |
| Function call | Local scope | Return address, local variables | Minimal | Normal execution flow |
Ruby Concurrency Models
| Model | Creation | Scheduling | GVL | Parallelism | IPC Method |
|---|---|---|---|---|---|
| Thread | Thread.new | OS preemptive | Shared | No (I/O only) | Shared memory |
| Process | Process.fork | OS preemptive | Per-process | Yes | Pipes, sockets |
| Fiber | Fiber.new | User cooperative | N/A | No | Direct calls |
| Ractor | Ractor.new | OS preemptive | Per-ractor | Yes | Message passing |
Context Switch Triggers
| Trigger | Type | Frequency | Control |
|---|---|---|---|
| Timer interrupt | Involuntary | Every time slice | OS scheduler |
| I/O system call | Voluntary | Per I/O operation | Application |
| Sleep call | Voluntary | Explicit | Application |
| Lock contention | Voluntary | Per blocked acquisition | Application |
| Thread.pass | Voluntary | Explicit | Application |
| Fiber.yield | Voluntary | Explicit | Application |
Performance Characteristics
| Operation | Approximate Cost | Primary Overhead | Mitigation Strategy |
|---|---|---|---|
| Process switch | 1-10 microseconds | Memory mapping, TLB flush | Use threads when isolation unnecessary |
| Thread switch | 100-1000 nanoseconds | Register save/restore, cache pollution | Pool threads, batch work |
| Fiber switch | 10-100 nanoseconds | Stack frame save | Prefer for I/O multiplexing |
| Mutex lock (uncontended) | 10-50 nanoseconds | Atomic operation | Keep critical sections short |
| Mutex lock (contended) | Context switch cost | Blocking, context switch | Reduce lock scope, use lock-free structures |
Monitoring Commands
| Tool | Purpose | Example Output |
|---|---|---|
| /proc/[pid]/status | Per-process switch counts | voluntary_ctxt_switches: 150 |
| vmstat | System-wide context switches | cs: 25000 (per second) |
| pidstat -w | Process switch rates | cswch/s: 50 (voluntary) |
| perf stat | Hardware counter events | context-switches: 15,000 |
Ruby Thread States
| State | Description | Can Switch | Holding GVL |
|---|---|---|---|
| Runnable | Ready to execute | Yes | No |
| Running | Executing bytecode | No | Yes |
| Sleeping | Blocked on sleep | Yes | No |
| Waiting | Blocked on I/O or lock | Yes | No |
| Dead | Finished execution | N/A | No |
Context Preservation Scope
| Component | Process Switch | Thread Switch | Fiber Switch |
|---|---|---|---|
| Program counter | Yes | Yes | Yes |
| CPU registers | Yes | Yes | Minimal |
| Stack pointer | Yes | Yes | Yes |
| Stack contents | Yes | Yes | Yes |
| Heap memory | Yes (copy-on-write) | Shared | Shared |
| File descriptors | Yes | Shared | Shared |
| Signal handlers | Yes | Shared | Shared |
| Memory mappings | Yes | Shared | Shared |
| Thread-local storage | N/A | Yes | No |
Optimization Guidelines
| Scenario | Recommendation | Reasoning |
|---|---|---|
| I/O-bound with Ruby | Use threads, release GVL during I/O | Threads efficient for I/O multiplexing |
| CPU-bound with Ruby | Use processes or Ractors | Avoid GVL serialization |
| Many concurrent connections | Use fibers or event loop | Minimize context switch overhead |
| Shared mutable state | Use threads with mutexes | Avoid IPC overhead |
| Fault isolation needed | Use processes | Separate address spaces |
| Predictable scheduling | Use fibers | Explicit control over switches |