CrackedRuby - Context Switching

Overview

Context switching represents the mechanism by which an operating system stores and restores the execution state of a process or thread, enabling multiple execution contexts to share processor resources. When the OS scheduler decides to suspend one execution context and resume another, it performs a context switch that involves saving the complete state of the currently running context and loading the previously saved state of the incoming context.

The process occurs transparently to running programs but carries measurable performance costs. Modern operating systems perform thousands of context switches per second across system processes, application threads, and kernel operations. Understanding context switching mechanics proves essential for optimizing concurrent applications, diagnosing performance bottlenecks, and making informed decisions about concurrency models.

Context switching occurs at multiple levels: between processes managed by the OS scheduler, between threads within a single process, and between lighter-weight execution contexts like coroutines or fibers. Each level involves different state preservation requirements and performance characteristics. Process context switches require complete memory mapping changes and kernel involvement, while thread switches within the same process share memory space but still require register and stack state management.

# Observing context switching through thread behavior
require 'benchmark'

# Single-threaded execution
single_thread_time = Benchmark.measure do
  10_000.times { Math.sqrt(rand(1000)) }
end

# Multi-threaded execution that forces context switches
multi_thread_time = Benchmark.measure do
  threads = 4.times.map do
    Thread.new { 2_500.times { Math.sqrt(rand(1000)) } }
  end
  threads.each(&:join)
end

puts "Single thread: #{single_thread_time.real}"
puts "Multiple threads: #{multi_thread_time.real}"
# Multi-threaded may be slower due to context switching overhead

Key Principles

A context represents the complete execution state required to resume a suspended computation. For processes, this includes the program counter (instruction pointer), processor registers, stack pointer, memory mappings, open file descriptors, signal handlers, and process ownership information. For threads, the context includes thread-specific registers, stack pointer, and thread-local storage, but shares the process memory space and file descriptors with other threads in the same process.

The operating system scheduler triggers context switches based on scheduling policies. Preemptive multitasking forces involuntary context switches when a time slice expires or a higher-priority task becomes ready. Voluntary context switches occur when a process blocks on I/O, explicitly yields the processor, or terminates. The scheduler selects the next context to run based on priority, fairness algorithms (like Completely Fair Scheduler in Linux), or real-time requirements.

Context switch mechanics proceed through distinct phases. First, the kernel intercepts control, triggered by a timer interrupt, system call, or hardware event. Second, the kernel saves the current context state to a process control block or thread control block data structure in kernel memory. Third, the scheduler algorithm selects the next context to execute. Fourth, the kernel loads the saved state from the selected context's control block into processor registers and memory management unit. Finally, control transfers to the restored context, which resumes execution from its saved program counter.

The cost of context switching stems from multiple factors. Direct costs include the CPU cycles required to save and restore register state, update memory management structures, and execute scheduler code. Indirect costs prove more significant: cache invalidation forces memory accesses to slower levels of the memory hierarchy, TLB (Translation Lookaside Buffer) flushes require virtual-to-physical address remapping, and pipeline stalls waste instruction-level parallelism. Process switches incur additional overhead from changing memory page tables and flushing processor caches entirely.

Thread context switches within a process cost less than full process switches because threads share the same address space, avoiding TLB flushes and page table updates. However, thread switches still invalidate CPU caches with the new thread's data and instruction streams. Cooperative multitasking systems like fiber schedulers eliminate involuntary switches, reducing overhead further by switching only at well-defined points where minimal state requires preservation.

# Demonstrating the context components in Ruby
class ExecutionContext
  attr_reader :stack, :program_counter, :local_variables
  
  def initialize
    @stack = []
    @program_counter = 0
    @local_variables = {}
  end
  
  def save_state
    {
      stack: @stack.dup,
      pc: @program_counter,
      locals: @local_variables.dup
    }
  end
  
  def restore_state(state)
    @stack = state[:stack]
    @program_counter = state[:pc]
    @local_variables = state[:locals]
  end
end

# Simplified fiber-like behavior showing state preservation
context1 = ExecutionContext.new
context1.stack.push(1, 2, 3)
context1.local_variables[:result] = 42

saved = context1.save_state
context1.stack.clear

context1.restore_state(saved)
puts context1.stack.inspect  # => [1, 2, 3]
puts context1.local_variables[:result]  # => 42

Operating systems employ various strategies to minimize context switch overhead. Modern processors provide hardware support through special instructions for rapid state saving and restoration. The scheduler batches operations to amortize fixed costs across multiple switches. Cache-aware scheduling attempts to keep threads on the same processor core to preserve cache warmth. Voluntary context switches at I/O boundaries prove cheaper than preemptive switches because the application state aligns with a safe switching point.

Implementation Approaches

Process-based concurrency creates separate address spaces for each execution context. When the application forks a new process, the OS duplicates the parent's memory space and assigns a unique process ID. Process switches require complete memory mapping changes, making them expensive but providing strong isolation. Processes communicate through inter-process communication mechanisms like pipes, sockets, or shared memory regions. This approach suits applications requiring fault isolation, security boundaries, or true parallel execution across multiple CPU cores without shared state concerns.

Thread-based concurrency shares a single address space among multiple execution contexts within a process. Threads access the same heap memory, global variables, and open file descriptors while maintaining separate stacks and register sets. Thread creation and context switching cost less than process operations because memory mappings remain unchanged. However, threads require explicit synchronization primitives (mutexes, semaphores, condition variables) to prevent race conditions on shared data. This model fits applications with shared state, frequent inter-thread communication, or fine-grained parallelism needs.

Fiber-based concurrency (cooperative multitasking) implements user-space scheduling where execution contexts explicitly yield control rather than being preempted. Fibers eliminate timer interrupts and scheduler overhead, switching only at designated yield points. The programmer controls when context switches occur, avoiding the need for locks around atomic operations. Fibers cannot preempt long-running computations and cannot achieve true parallelism on multiple cores, but they enable efficient I/O multiplexing and structured concurrency patterns. This approach excels for I/O-bound applications, coroutine-based control flow, or scenarios where predictable scheduling matters.

Event-driven architectures with callbacks avoid context switching entirely by maintaining a single execution context that processes events from a queue. The event loop dispatches handlers for I/O completion, timer expiration, or user actions without switching contexts. This model eliminates context switch overhead but requires non-blocking operations and complicates control flow with callback chains or promise patterns. Event-driven systems handle thousands of concurrent connections efficiently but struggle with CPU-intensive operations that block the event loop.

Hybrid approaches combine multiple models. Many applications use processes for isolation with threads for parallelism within each process. Ruby's threading model adds a Global VM Lock that serializes Ruby bytecode execution while still allowing I/O operations to release the lock, creating a hybrid between true parallel threads and single-threaded event processing. Modern async/await patterns implement cooperative scheduling on top of thread pools, gaining cooperative efficiency for I/O while maintaining parallel execution capability for CPU work.

# Process-based approach
pid = fork do
  puts "Child process: #{Process.pid}"
  sleep 1
  exit 0
end
puts "Parent process: #{Process.pid}"
Process.wait(pid)

# Thread-based approach
thread = Thread.new do
  puts "Thread: #{Thread.current.object_id}"
  sleep 1
end
puts "Main: #{Thread.main.object_id}"
thread.join

# Fiber-based approach
fiber = Fiber.new do
  puts "Fiber starts"
  Fiber.yield
  puts "Fiber resumes"
end
fiber.resume
puts "Main continues"
fiber.resume

Actor model implementations encapsulate state within actors that communicate through message passing. Each actor processes messages sequentially from a mailbox queue, eliminating shared mutable state and race conditions. Context switches occur between actors rather than within them. This model maps naturally to distributed systems where actors run on different machines. The approach trades context switching overhead for message passing overhead while gaining simpler reasoning about concurrent state.

Ruby Implementation

Ruby implements multiple concurrency models with different context switching characteristics. The core runtime includes native threads managed by the operating system, lightweight fibers for cooperative multitasking, and process forking for complete isolation. Ruby's Global VM Lock (GVL), also called Global Interpreter Lock, serializes execution of Ruby bytecode across threads, fundamentally changing the context switching behavior compared to traditional threading models.

Native threads in Ruby create actual OS threads through the pthread library on Unix systems or Windows threads on Windows. Each Ruby thread corresponds to an OS-level thread that the kernel scheduler manages. However, the GVL prevents multiple Ruby threads from executing Ruby code simultaneously. When a thread wants to execute Ruby bytecode, it must acquire the GVL. If another thread holds the lock, the requesting thread blocks, triggering an OS context switch to another runnable thread. The GVL holder releases the lock periodically or when performing I/O operations, allowing context switches to other waiting Ruby threads.

require 'thread'

# GVL demonstration: threads compete for the lock
start_time = Time.now
mutex = Mutex.new
counter = 0

threads = 10.times.map do |i|
  Thread.new do
    1000.times do
      mutex.synchronize { counter += 1 }
      # Each synchronize point can trigger context switch
    end
  end
end

threads.each(&:join)
elapsed = Time.now - start_time

puts "Counter: #{counter}"
puts "Time: #{elapsed}s"
puts "Context switches occurred during mutex contention"

The GVL design prevents data races within Ruby's internal structures and simplifies C extension development, but limits CPU-bound parallelism. CPU-intensive Ruby code running on multiple threads still executes serially, switching contexts between threads but never running truly parallel. I/O operations release the GVL, permitting other threads to execute during I/O waits, making Ruby threading effective for I/O-bound concurrency despite the lock.

Fibers provide cooperative multitasking where context switches occur only at explicit yield points. Creating a fiber allocates a stack and establishes an execution context, but fibers don't run until explicitly resumed. The Fiber.yield method saves the current fiber's context and returns control to the caller. The Fiber#resume method switches context back to the fiber. Fibers don't interact with the OS scheduler and don't require GVL acquisition because only one fiber runs at any time.

# Fiber context switching for producer-consumer pattern
def produce(items)
  Fiber.new do
    items.each do |item|
      puts "Producing: #{item}"
      Fiber.yield item  # Context switch to consumer
    end
    nil
  end
end

producer = produce([1, 2, 3, 4, 5])

while (item = producer.resume)  # Context switch to producer
  puts "Consuming: #{item}"
  sleep 0.1  # Simulate work
end

# Output shows alternating producer/consumer with explicit switches

Process forking creates complete OS-level processes through the fork system call. The child process receives a copy-on-write duplicate of the parent's memory space, file descriptors, and process state. Forked processes run independently with separate address spaces, requiring IPC mechanisms for communication. Ruby's Process.fork triggers full process context switches when the OS scheduler switches between parent and child processes. Each process maintains its own GVL, enabling true parallel execution on multiple cores.

# Process-based parallelism with IPC
read_pipe, write_pipe = IO.pipe

pid = fork do
  read_pipe.close
  result = (1..1_000_000).reduce(:+)
  Marshal.dump(result, write_pipe)
  write_pipe.close
end

write_pipe.close
result = Marshal.load(read_pipe)
read_pipe.close
Process.wait(pid)

puts "Child computed: #{result}"
# Each process runs on potentially different cores
# Context switching happens between processes

Ruby's Thread#priority attribute influences OS scheduler decisions about which thread to run after a context switch. Higher priority threads receive preference when multiple threads compete for CPU time. Setting priority affects context switch frequency by changing how often the scheduler selects particular threads, though the GVL still serializes Ruby bytecode execution.

The Thread.pass method explicitly yields the processor, requesting a context switch to another thread. This voluntary yield allows cooperative scheduling patterns within Ruby's preemptive threading model. Calling Thread.pass inside tight loops can reduce contention by giving other threads opportunities to acquire the GVL.

# Voluntary context switching with Thread.pass
producer_done = false
queue = []
mutex = Mutex.new

producer = Thread.new do
  5.times do |i|
    mutex.synchronize do
      queue << i
      puts "Produced: #{i}"
    end
    Thread.pass  # Explicit yield to let consumer run
  end
  producer_done = true
end

consumer = Thread.new do
  until producer_done && queue.empty?
    item = mutex.synchronize { queue.shift }
    if item
      puts "Consumed: #{item}"
    else
      Thread.pass  # Yield if queue empty
    end
  end
end

[producer, consumer].each(&:join)

Ruby 3.0 introduced Ractor for actor-based parallelism without GVL constraints. Each Ractor runs on a separate thread with its own GVL, enabling true parallel execution of Ruby code. Ractors communicate through message passing and cannot share mutable objects. Context switches occur both at the OS thread level between ractors and within each ractor's thread for normal Ruby threading. Ractors trade shared memory convenience for parallel execution capability.

Performance Considerations

Context switch frequency directly impacts application throughput. Each switch consumes CPU cycles for state preservation, scheduler execution, and state restoration. Applications performing thousands of switches per second spend measurable time in kernel context switching code rather than application logic. High-frequency switching compounds with cache effects, where each context switch invalidates cached data from the previous context, forcing memory accesses to slower cache levels or main memory.

Thread count affects context switching overhead non-linearly. With more threads than CPU cores, the OS must time-slice cores among threads, increasing switch frequency. The optimal thread count for I/O-bound applications often exceeds core count significantly because blocked threads don't consume CPU. For CPU-bound applications, thread counts matching or slightly exceeding core count minimize switching while maintaining core utilization. Excessive threads create contention for the GVL in Ruby, serializing execution and maximizing context switch overhead without parallelism gains.

require 'benchmark'

def compute_intensive_task
  1_000.times { Math.sqrt(rand(10000)) }
end

# Benchmark context switching overhead with varying thread counts
[1, 2, 4, 8, 16, 32].each do |thread_count|
  time = Benchmark.measure do
    threads = thread_count.times.map do
      Thread.new { 1000.times { compute_intensive_task } }
    end
    threads.each(&:join)
  end
  
  puts "#{thread_count} threads: #{time.real.round(3)}s"
end

# Output shows performance degrading as thread count increases
# beyond CPU core count due to context switching overhead

Voluntary context switches at I/O operations cost less than involuntary timer-based preemption. When a thread blocks on I/O, it enters a wait state where the kernel can efficiently switch to another ready thread without preserving as much CPU state. Preemptive switches triggered by timer interrupts occur at arbitrary instruction boundaries, requiring full register state preservation and cache line writebacks. Ruby's GVL release during I/O operations takes advantage of voluntary switching economics.

Lock contention creates context switch storms. When multiple threads compete for a mutex, losers block and trigger context switches. The winner executes its critical section briefly, releases the lock, and another thread acquires it, often immediately blocking again. This pattern generates excessive switching with minimal productive work. Fine-grained locking increases the contention surface area, while coarse-grained locking reduces parallelism. The optimal lock granularity balances these competing concerns.

require 'benchmark'

# Demonstrating lock contention impact
shared_data = { counter: 0 }
mutex = Mutex.new

# Fine-grained locking with high contention
contended_time = Benchmark.measure do
  threads = 8.times.map do
    Thread.new do
      1000.times do
        mutex.synchronize do
          shared_data[:counter] += 1
          # Tiny critical section causes rapid lock cycling
          # Many context switches as threads compete
        end
      end
    end
  end
  threads.each(&:join)
end

shared_data[:counter] = 0

# Batched updates reduce lock acquisitions
batched_time = Benchmark.measure do
  threads = 8.times.map do
    Thread.new do
      local_sum = 0
      1000.times { local_sum += 1 }
      mutex.synchronize do
        shared_data[:counter] += local_sum
        # Single lock acquisition per thread
      end
    end
  end
  threads.each(&:join)
end

puts "Contended: #{contended_time.real.round(3)}s"
puts "Batched: #{batched_time.real.round(3)}s"
# Batched approach shows significant speedup

Measuring context switch rates helps identify performance issues. Linux provides context switch statistics through /proc/[pid]/status showing voluntary and involuntary switches. High involuntary switch counts indicate CPU-bound threads competing for time slices. High voluntary switch counts suggest I/O waiting or lock contention. Ruby applications can sample these metrics to correlate switch rates with performance degradation.

Affinity settings reduce context switch overhead by binding threads to specific CPU cores. When a thread remains on one core, the CPU caches stay warm with the thread's data and instructions. Linux's taskset command or Ruby's Fiddle library to call sched_setaffinity can establish CPU affinity. This optimization matters most for CPU-bound threads where cache performance dominates. I/O-bound threads benefit less because they frequently block and surrender CPU time anyway.

# Monitoring context switches in Ruby processes
def read_context_switches(pid = Process.pid)
  status_path = "/proc/#{pid}/status"
  return unless File.exist?(status_path)
  
  content = File.read(status_path)
  voluntary = content[/voluntary_ctxt_switches:\s+(\d+)/, 1].to_i
  involuntary = content[/nonvoluntary_ctxt_switches:\s+(\d+)/, 1].to_i
  
  { voluntary: voluntary, involuntary: involuntary }
end

# Sample at intervals to measure rate
start_switches = read_context_switches
start_time = Time.now

# Perform work
threads = 4.times.map do
  Thread.new { 10_000.times { Math.sqrt(rand) } }
end
threads.each(&:join)

end_switches = read_context_switches
elapsed = Time.now - start_time

if start_switches && end_switches
  vol_rate = (end_switches[:voluntary] - start_switches[:voluntary]) / elapsed
  invol_rate = (end_switches[:involuntary] - start_switches[:involuntary]) / elapsed
  
  puts "Voluntary switches/sec: #{vol_rate.round(2)}"
  puts "Involuntary switches/sec: #{invol_rate.round(2)}"
end

Fiber-based architectures minimize context switching costs by eliminating kernel involvement. User-space context switches execute orders of magnitude faster than kernel-mediated thread switches. Applications handling many concurrent I/O operations, like web servers, benefit substantially from fiber architectures. However, fibers cannot preempt CPU-bound operations, requiring explicit yields. Long-running computations without yield points monopolize the CPU, starving other fibers.

Practical Examples

A web server demonstrates context switching across request handling. Traditional threaded servers create a thread per connection, relying on OS context switching when threads block on socket I/O. Under load with thousands of concurrent connections, excessive threads create context switch overhead. Event-driven or fiber-based servers maintain fewer OS threads, using cooperative switching to multiplex many connections onto each thread.

require 'socket'
require 'fiber'

# Thread-per-connection server (high context switching)
def threaded_server(port)
  server = TCPServer.new(port)
  
  loop do
    client = server.accept
    Thread.new(client) do |conn|
      # OS context switch on each accept and read
      request = conn.gets
      conn.puts "HTTP/1.1 200 OK\r\n\r\nReceived"
      conn.close
    end
  end
end

# Fiber-based server (reduced context switching)
def fiber_server(port)
  server = TCPServer.new(port)
  fibers = []
  
  # Accept loop
  acceptor = Fiber.new do
    loop do
      client = server.accept
      handler = Fiber.new do
        request = client.gets
        client.puts "HTTP/1.1 200 OK\r\n\r\nReceived"
        client.close
      end
      handler.resume
      Fiber.yield
    end
  end
  
  loop { acceptor.resume }
end

# The fiber version reduces OS context switches
# by handling multiple connections in one thread

Database connection pooling illustrates context switching in resource management. Each thread needing database access must acquire a connection from the pool. When connections are exhausted, threads block and context switch to other threads. Proper pool sizing balances connection overhead against context switch frequency. Too few connections cause excessive blocking and switching. Too many connections waste memory and database resources.

require 'thread'

class ConnectionPool
  def initialize(size)
    @pool = Queue.new
    @mutex = Mutex.new
    @resource_count = 0
    @size = size
    
    size.times do
      @pool << create_connection
    end
  end
  
  def with_connection
    conn = @pool.pop  # Blocks if pool empty, triggers context switch
    begin
      yield conn
    ensure
      @pool << conn
    end
  end
  
  private
  
  def create_connection
    @mutex.synchronize do
      @resource_count += 1
      "Connection-#{@resource_count}"
    end
  end
end

pool = ConnectionPool.new(5)

# Simulate 10 threads competing for 5 connections
threads = 10.times.map do |i|
  Thread.new do
    pool.with_connection do |conn|
      puts "Thread #{i} acquired #{conn}"
      sleep 0.1  # Simulate query
      # Context switches to waiting threads when connection released
    end
  end
end

threads.each(&:join)

Producer-consumer patterns showcase context switching for work distribution. Multiple producer threads generate work items while consumer threads process them. When the queue fills, producers block and switch context. When empty, consumers block and switch. Optimal queue sizing and thread counts minimize switching while maintaining throughput.

require 'thread'

class WorkQueue
  def initialize(max_size)
    @queue = Queue.new
    @max_size = max_size
    @mutex = Mutex.new
    @cond_not_full = ConditionVariable.new
    @cond_not_empty = ConditionVariable.new
  end
  
  def push(item)
    @mutex.synchronize do
      while @queue.size >= @max_size
        @cond_not_full.wait(@mutex)  # Context switch here
      end
      @queue << item
      @cond_not_empty.signal
    end
  end
  
  def pop
    @mutex.synchronize do
      while @queue.empty?
        @cond_not_empty.wait(@mutex)  # Context switch here
      end
      item = @queue.pop
      @cond_not_full.signal
      item
    end
  end
end

work_queue = WorkQueue.new(10)
done = false

producers = 3.times.map do |i|
  Thread.new do
    10.times do |j|
      work_queue.push("Item-#{i}-#{j}")
      puts "Produced: Item-#{i}-#{j}"
    end
  end
end

consumers = 2.times.map do
  Thread.new do
    loop do
      item = work_queue.pop
      break if item.nil?
      puts "Consumed: #{item}"
      sleep 0.01  # Simulate processing
    end
  end
end

producers.each(&:join)
consumers.size.times { work_queue.push(nil) }
consumers.each(&:join)

Parallel computation with result aggregation demonstrates context switching in CPU-bound scenarios. Forked processes execute computations truly in parallel, with context switches between processes managed by the OS scheduler. Process-based parallelism avoids GVL limitations but requires IPC for result collection.

require 'benchmark'

def parallel_sum_processes(ranges)
  results = []
  pipes = ranges.map { IO.pipe }
  
  pids = ranges.each_with_index.map do |range, i|
    fork do
      pipes.each { |r, w| r.close unless i == pipes.index([r, w]) }
      pipes[i][0].close
      
      sum = range.reduce(:+)
      Marshal.dump(sum, pipes[i][1])
      pipes[i][1].close
    end
  end
  
  pipes.each { |r, w| w.close }
  
  results = pipes.map do |r, w|
    result = Marshal.load(r)
    r.close
    result
  end
  
  pids.each { |pid| Process.wait(pid) }
  results.reduce(:+)
end

ranges = [
  (1..1_000_000),
  (1_000_001..2_000_000),
  (2_000_001..3_000_000),
  (3_000_001..4_000_000)
]

time = Benchmark.measure do
  total = parallel_sum_processes(ranges)
  puts "Sum: #{total}"
end

puts "Parallel processes: #{time.real.round(3)}s"
# True parallelism with process context switching

Common Pitfalls

Spawning excessive threads creates context switch thrashing. Applications that create threads for every operation generate far more threads than CPU cores. The OS spends more time switching contexts than executing application code. Each switch involves kernel overhead, cache invalidation, and TLB flushes. The solution involves thread pooling, where a fixed number of threads process work items from a queue, maintaining thread count proportional to CPU cores.

# Problematic: thread per item
def process_items_bad(items)
  threads = items.map do |item|
    Thread.new { process(item) }
  end
  threads.each(&:join)
end

# Better: thread pool
require 'thread'

def process_items_good(items)
  queue = Queue.new
  items.each { |item| queue << item }
  
  workers = 4.times.map do
    Thread.new do
      while (item = queue.pop(true) rescue nil)
        process(item)
      end
    end
  end
  
  workers.each(&:join)
end

Holding locks across context switches multiplies contention. When a thread acquires a lock and then blocks on I/O or sleeps, other threads waiting for that lock must context switch repeatedly, checking lock availability. Each check involves context switch overhead without progress. Releasing locks before blocking operations allows other threads to make progress without unnecessary switching.

Misunderstanding Ruby's GVL leads to false parallelism expectations. Developers create multiple threads for CPU-bound work, expecting parallel execution, but the GVL serializes execution. All threads compete for the single lock, creating maximum context switch overhead with zero parallelism benefit. CPU-bound parallelism requires forked processes or Ractor-based approaches that provide separate GVLs.

# This won't parallelize CPU-bound work
threads = 8.times.map do
  Thread.new do
    result = 0
    1_000_000.times { result += Math.sqrt(rand) }
    result
  end
end

results = threads.map(&:value)
# All threads ran serially due to GVL
# High context switching overhead without parallelism benefit

Neglecting voluntary yield points in fibers causes starvation. Fibers require explicit yields for scheduling. A fiber with CPU-intensive code that never yields monopolizes execution, preventing other fibers from running. Other fibers starve until the monopolizing fiber completes or yields. Regular yield points or time-based yielding prevents starvation.

Spinning on lock acquisition wastes CPU and increases switching. Busy-wait loops that repeatedly check lock availability burn CPU cycles and force unnecessary context switches as the OS attempts to schedule the spinning thread. Using blocking lock primitives with condition variables allows threads to sleep when locks are unavailable, eliminating spin overhead and reducing context switches.

# Bad: busy waiting
@lock = false

thread1 = Thread.new do
  while @lock
    # Spinning consumes CPU and forces context switches
    Thread.pass
  end
  @lock = true
  # Critical section
  @lock = false
end

# Good: blocking primitive
mutex = Mutex.new

thread2 = Thread.new do
  mutex.synchronize do
    # Blocks efficiently without spinning
    # OS knows thread is waiting, reduces context switches
  end
end

Ignoring context switch costs in benchmark results misleads optimization efforts. Microbenchmarks that time operations in tight loops may show different performance than production code with realistic context switching. Production applications interleave I/O, synchronization, and computation, incurring real context switch overhead. Benchmarks should reflect realistic concurrency patterns to measure true performance including switching costs.

Reference

Context Switch Types

Type	Scope	State Saved	Cost	Use Case
Process	Entire process	Full CPU state, memory mappings, file descriptors	High	Isolation, true parallelism
Thread	Single thread	Registers, stack pointer, thread-local storage	Medium	Shared memory concurrency
Fiber	Execution point	Minimal stack frame	Low	Cooperative multitasking
Function call	Local scope	Return address, local variables	Minimal	Normal execution flow

Ruby Concurrency Models

Model	Creation	Scheduling	GVL	Parallelism	IPC Method
Thread	Thread.new	OS preemptive	Shared	No (I/O only)	Shared memory
Process	Process.fork	OS preemptive	Per-process	Yes	Pipes, sockets
Fiber	Fiber.new	User cooperative	N/A	No	Direct calls
Ractor	Ractor.new	OS preemptive	Per-ractor	Yes	Message passing

Context Switch Triggers

Trigger	Type	Frequency	Control
Timer interrupt	Involuntary	Every time slice	OS scheduler
I/O system call	Voluntary	Per I/O operation	Application
Sleep call	Voluntary	Explicit	Application
Lock contention	Voluntary	Per blocked acquisition	Application
Thread.pass	Voluntary	Explicit	Application
Fiber.yield	Voluntary	Explicit	Application

Performance Characteristics

Operation	Approximate Cost	Primary Overhead	Mitigation Strategy
Process switch	1-10 microseconds	Memory mapping, TLB flush	Use threads when isolation unnecessary
Thread switch	100-1000 nanoseconds	Register save/restore, cache pollution	Pool threads, batch work
Fiber switch	10-100 nanoseconds	Stack frame save	Prefer for I/O multiplexing
Mutex lock (uncontended)	10-50 nanoseconds	Atomic operation	Keep critical sections short
Mutex lock (contended)	Context switch cost	Blocking, context switch	Reduce lock scope, use lock-free structures

Monitoring Commands

Tool	Purpose	Example Output
/proc/[pid]/status	Per-process switch counts	voluntary_ctxt_switches: 150
vmstat	System-wide context switches	cs: 25000 (per second)
pidstat -w	Process switch rates	cswch/s: 50 (voluntary)
perf stat	Hardware counter events	context-switches: 15,000

Ruby Thread States

State	Description	Can Switch	Holding GVL
Runnable	Ready to execute	Yes	No
Running	Executing bytecode	No	Yes
Sleeping	Blocked on sleep	Yes	No
Waiting	Blocked on I/O or lock	Yes	No
Dead	Finished execution	N/A	No

Context Preservation Scope

Component	Process Switch	Thread Switch	Fiber Switch
Program counter	Yes	Yes	Yes
CPU registers	Yes	Yes	Minimal
Stack pointer	Yes	Yes	Yes
Stack contents	Yes	Yes	Yes
Heap memory	Yes (copy-on-write)	Shared	Shared
File descriptors	Yes	Shared	Shared
Signal handlers	Yes	Shared	Shared
Memory mappings	Yes	Shared	Shared
Thread-local storage	N/A	Yes	No

Optimization Guidelines

Scenario	Recommendation	Reasoning
I/O-bound with Ruby	Use threads, release GVL during I/O	Threads efficient for I/O multiplexing
CPU-bound with Ruby	Use processes or Ractors	Avoid GVL serialization
Many concurrent connections	Use fibers or event loop	Minimize context switch overhead
Shared mutable state	Use threads with mutexes	Avoid IPC overhead
Fault isolation needed	Use processes	Separate address spaces
Predictable scheduling	Use fibers	Explicit control over switches

Context Switching