CrackedRuby - I/O Scheduling

Overview

I/O scheduling operates at the operating system level, managing the queue of read and write requests to storage devices. When multiple processes issue I/O operations, the OS cannot simply service them in the order received. Physical disk characteristics—particularly the mechanical movement of read/write heads—create opportunities for significant optimization.

The scheduler sits between the file system and the block device driver. Applications issue I/O requests through system calls, the file system translates these into block-level operations, and the I/O scheduler reorders these operations before passing them to the device driver. This reordering can reduce total service time by 50% or more compared to naive first-come-first-served ordering.

Consider a scenario where three processes request disk blocks at cylinders 10, 100, and 15, with the disk head currently at cylinder 50. Servicing these in order requires head movement of 40 + 90 + 85 = 215 cylinders. An intelligent scheduler might reorder this to 10, 15, 100, requiring only 40 + 5 + 85 = 130 cylinders of movement—a 40% improvement.

I/O scheduling became critical with the rise of multiprogramming systems in the 1960s. Early time-sharing systems experienced severe performance degradation when multiple users accessed disk simultaneously. The introduction of scheduling algorithms transformed disk I/O from a sequential bottleneck into a resource that could support dozens of concurrent users.

Modern systems face different challenges. Solid-state drives eliminated mechanical seek time, rendering traditional scheduling algorithms obsolete for SSDs. However, HDDs remain prevalent in data centers for bulk storage, making I/O scheduling relevant for database servers, file servers, and any application dealing with large datasets on spinning disks.

# I/O request pattern that benefits from scheduling
File.open('large_dataset.bin', 'rb') do |file|
  # Random access pattern - scheduler optimizes physical disk access
  offsets = [1000000, 50000, 900000, 100000]
  offsets.each do |offset|
    file.seek(offset)
    data = file.read(4096)
    # Process data
  end
end

Key Principles

I/O scheduling algorithms optimize disk access by understanding physical disk geometry. A hard disk contains platters spinning at constant angular velocity, typically 5400-15000 RPM. Read/write heads move radially across the platter surface on an actuator arm. Data organizes into concentric tracks, with each track divided into sectors.

Seek Time represents the duration required to move the disk head from its current track to the target track. This mechanical operation dominates I/O latency on HDDs, typically ranging from 3-15 milliseconds. Seek time increases roughly linearly with distance, though not perfectly due to acceleration and deceleration phases.

Rotational Latency occurs after the head reaches the correct track. The head must wait for the target sector to rotate under it. On a 7200 RPM disk, average rotational latency is 4.2 milliseconds (half a rotation). This component remains constant regardless of request ordering.

Transfer Time represents the actual data read or write duration once the head positions correctly. For a 4KB block on a modern HDD, transfer time typically measures 0.1-0.3 milliseconds—negligible compared to seek and rotational delays.

The scheduler's primary objective minimizes total seek time across all pending requests. Secondary goals include fairness (preventing request starvation), bounded response time (avoiding indefinite delays), and throughput maximization. These goals often conflict. A scheduler optimizing purely for seek time might indefinitely postpone requests to distant tracks if requests keep arriving for nearby tracks.

Request merging forms another critical principle. When multiple requests target adjacent or overlapping disk blocks, the scheduler merges them into a single operation. This eliminates redundant head movements and reduces the number of operations the device driver must process.

# Sequential access benefits from request merging
File.open('logfile.txt', 'a') do |file|
  1000.times do |i|
    # These writes may be merged by the OS buffer cache
    # and I/O scheduler into fewer disk operations
    file.puts "Log entry #{i}"
  end
  # Flush explicitly if timing matters
  file.flush
end

The request queue maintains pending I/O operations. When an application issues a read or write, the kernel adds this request to the queue rather than immediately dispatching it to the device. The scheduler periodically examines the queue, selects the next request according to its algorithm, and submits it to the device driver. Queue depth affects scheduling effectiveness—deeper queues provide more optimization opportunities but increase latency for individual requests.

Starvation prevention mechanisms ensure that no request waits indefinitely. Pure seek-time optimization algorithms can starve requests to distant disk regions. Schedulers implement timeouts, aging mechanisms, or fairness quotas to guarantee service within bounded time.

Modern schedulers distinguish between read and write operations. Reads block the requesting process, directly impacting application response time. Writes can proceed asynchronously, with the calling process continuing immediately after data reaches the buffer cache. This asymmetry leads schedulers to prioritize reads over writes, improving perceived system responsiveness at the cost of write throughput.

# Read operations block, write operations return immediately
start = Time.now
File.write('test.txt', 'data' * 1000000)
write_time = Time.now - start
# Write returns quickly - data still in cache

start = Time.now
data = File.read('test.txt')
read_time = Time.now - start
# Read might wait for physical disk access

puts "Write: #{write_time}s, Read: #{read_time}s"
# Typically write_time < read_time for large files

Implementation Approaches

First-Come, First-Served (FCFS) represents the simplest scheduling algorithm. Requests service in arrival order without reordering. This guarantees fairness and prevents starvation, but provides no seek optimization. If requests arrive for cylinders 50, 10, 90, 20, the head moves from current position to 50, then 10 (40 cylinders back), then 90 (80 forward), then 20 (70 back)—extensive back-and-forth movement.

FCFS performs acceptably under light load when few requests compete. With only one or two outstanding operations, sophisticated scheduling provides minimal benefit. However, under moderate or heavy load, FCFS performs poorly. The lack of optimization means worst-case and average-case performance converge, with throughput often 50-70% lower than optimizing schedulers.

Shortest Seek Time First (SSTF) selects the pending request closest to the current head position. If the head rests at cylinder 50 with pending requests at cylinders 20, 55, and 90, SSTF chooses cylinder 55 (distance 5), then either 20 or 90 depending on subsequent arrivals. This greedy approach minimizes seek distance for each individual operation.

SSTF dramatically improves throughput over FCFS, typically achieving 2-3x better performance under load. However, it suffers from starvation problems. If requests continuously arrive for cylinders near the current head position, requests for distant cylinders may wait indefinitely. A request for cylinder 10 might never service if requests keep arriving for cylinders 40-60 while the head services that region.

# Simulating SSTF behavior
class SSTFScheduler
  def initialize(start_position)
    @current_position = start_position
    @queue = []
  end
  
  def add_request(cylinder)
    @queue << cylinder
  end
  
  def schedule_next
    return nil if @queue.empty?
    
    # Find closest cylinder to current position
    next_request = @queue.min_by { |cyl| (cyl - @current_position).abs }
    @queue.delete_at(@queue.index(next_request))
    
    distance = (next_request - @current_position).abs
    @current_position = next_request
    
    { cylinder: next_request, seek_distance: distance }
  end
end

scheduler = SSTFScheduler.new(50)
[20, 90, 55, 30, 85].each { |cyl| scheduler.add_request(cyl) }

total_distance = 0
5.times do
  result = scheduler.schedule_next
  puts "Service cylinder #{result[:cylinder]}, seek: #{result[:seek_distance]}"
  total_distance += result[:seek_distance]
end
puts "Total seek distance: #{total_distance}"
# Output: 55, 85, 90, 30, 20 - total distance much less than FCFS

SCAN (Elevator Algorithm) moves the disk head in one direction, servicing all requests in that direction until reaching the disk end, then reverses direction. This mimics elevator behavior. Starting at cylinder 50 moving upward with requests at 20, 55, 90, 30, SCAN services 55 and 90, reaches the top, then reverses to service 30 and 20.

SCAN provides better fairness than SSTF while maintaining good throughput. Requests wait at most two full disk traversals. The algorithm prevents starvation since every request eventually falls in the head's current direction of travel. However, SCAN exhibits non-uniform wait times. Requests just behind the head's current direction wait nearly two full traversals, while requests in the current direction service almost immediately.

The algorithm also wastes movement. If the head moves to cylinder 199 (highest cylinder) but the furthest request sits at cylinder 180, the head traverses 19 cylinders unnecessarily. This overhead reduces as disk utilization increases, since requests arrive continuously during traversal.

C-SCAN (Circular SCAN) addresses SCAN's wait time non-uniformity. The head moves in one direction servicing requests, but upon reaching the end, it immediately returns to the beginning without servicing requests on the return trip. This treats the disk as a circular structure, providing more uniform wait times across all cylinder positions.

C-SCAN simplifies timing analysis. Maximum wait time equals one full disk traversal plus the time to service intervening requests. This predictability benefits real-time systems or applications requiring bounded I/O latency. However, C-SCAN wastes the return trip, reducing throughput compared to SCAN under heavy load.

# Simulating SCAN behavior
class SCANScheduler
  def initialize(start_position, max_cylinder = 199)
    @current_position = start_position
    @max_cylinder = max_cylinder
    @direction = :up  # :up or :down
    @queue = []
  end
  
  def add_request(cylinder)
    @queue << cylinder unless @queue.include?(cylinder)
  end
  
  def schedule_next
    return nil if @queue.empty?
    
    # Get requests in current direction
    if @direction == :up
      candidates = @queue.select { |cyl| cyl >= @current_position }.sort
      if candidates.empty?
        @direction = :down
        candidates = @queue.select { |cyl| cyl < @current_position }.sort.reverse
      end
    else
      candidates = @queue.select { |cyl| cyl <= @current_position }.sort.reverse
      if candidates.empty?
        @direction = :up
        candidates = @queue.select { |cyl| cyl > @current_position }.sort
      end
    end
    
    next_request = candidates.first
    @queue.delete(next_request)
    
    distance = (next_request - @current_position).abs
    @current_position = next_request
    
    { cylinder: next_request, seek_distance: distance, direction: @direction }
  end
end

scheduler = SCANScheduler.new(50)
requests = [20, 90, 55, 30, 85, 15, 75]
requests.each { |cyl| scheduler.add_request(cyl) }

puts "SCAN scheduling (starting at 50, direction up):"
requests.length.times do
  result = scheduler.schedule_next
  puts "  #{result[:cylinder]} (seek: #{result[:seek_distance]}, #{result[:direction]})"
end
# Services: 55, 75, 85, 90, then reverses: 30, 20, 15

LOOK and C-LOOK optimize SCAN and C-SCAN by eliminating unnecessary traversal to disk ends. LOOK reverses direction at the last request in the current direction rather than at the physical disk boundary. If the furthest request sits at cylinder 180, LOOK reverses there instead of continuing to cylinder 199.

This simple modification reduces average seek time by 5-15% depending on request distribution. The algorithm remains fair while eliminating wasted movement. C-LOOK applies the same optimization to C-SCAN, returning to the lowest pending request rather than cylinder 0.

Modern Linux systems implement variants of these classical algorithms adapted for contemporary hardware. The Completely Fair Queuing (CFQ) scheduler maintains separate queues per process, allocating disk time slices to each queue in round-robin fashion. Within each queue, CFQ applies LOOK-style ordering. This provides fairness across processes while maintaining good overall throughput.

The Deadline scheduler prioritizes requests approaching their timeout deadline. Each request receives a read or write deadline (typically 500ms for reads, 5s for writes). The scheduler normally uses LOOK ordering, but preempts this to service requests nearing their deadline. This prevents starvation while maintaining good throughput.

NOOP (No Operation) scheduling performs minimal reordering—only request merging. This algorithm targets SSDs where seek time disappears. Traditional scheduling algorithms provide no benefit and may harm performance by delaying requests. SSDs prefer receiving requests in submission order to optimize their internal parallelism and wear leveling algorithms.

Ruby Implementation

Ruby applications interact with I/O schedulers through standard system calls wrapped by Ruby's I/O classes. When Ruby code calls File.read, File.write, or socket operations, these eventually invoke POSIX functions like read(), write(), pread(), or pwrite(). The kernel routes these through the page cache and I/O scheduler before reaching the block device.

Ruby provides no direct control over I/O scheduling algorithms—these operate transparently at the kernel level. However, Ruby code influences scheduler behavior through access patterns, buffer sizes, and I/O operation ordering. Understanding scheduler behavior allows writing Ruby applications that cooperate with the scheduler for better performance.

The Ruby runtime maintains internal buffers for I/O operations. When opening a file with buffering enabled (default for text mode), Ruby accumulates writes in memory before issuing actual system calls. This reduces the number of I/O requests reaching the scheduler, allowing better merging and reordering opportunities.

# Buffered I/O - Ruby accumulates writes before syscalls
File.open('output.txt', 'w') do |file|
  10000.times do |i|
    file.puts "Line #{i}"
    # Data accumulates in Ruby buffer
  end
  # File close triggers flush, issuing buffered writes to OS
end

# Unbuffered I/O - each write becomes a syscall
File.open('output.txt', 'w') do |file|
  file.sync = true  # Disable Ruby buffering
  10000.times do |i|
    file.puts "Line #{i}"
    # Each puts triggers immediate system call
  end
end
# Unbuffered version may be 10-50x slower due to syscall overhead

The sync attribute controls Ruby's internal buffering. Setting file.sync = true disables buffering, causing each write operation to immediately invoke the underlying system call. This guarantees data reaches the OS buffer cache promptly but dramatically increases overhead. The OS still buffers writes unless explicitly flushed with fsync.

For fine-grained control, Ruby provides flush and fsync methods. The flush method writes buffered data to the OS but doesn't guarantee disk persistence—data remains in the page cache. The fsync method forces the OS to write cached data to physical storage, blocking until the I/O scheduler completes all pending operations for this file.

require 'benchmark'

# Demonstrate impact of sync and fsync
Benchmark.bm(20) do |bm|
  bm.report('buffered write:') do
    File.open('test1.dat', 'w') do |f|
      10000.times { f.write('x' * 100) }
    end
  end
  
  bm.report('buffered + flush:') do
    File.open('test2.dat', 'w') do |f|
      10000.times do
        f.write('x' * 100)
        f.flush  # Push to OS cache
      end
    end
  end
  
  bm.report('buffered + fsync:') do
    File.open('test3.dat', 'w') do |f|
      10000.times do
        f.write('x' * 100)
        f.flush
        f.fsync  # Force to disk - involves I/O scheduler
      end
    end
  end
end
# fsync version dramatically slower due to actual disk I/O

Ruby's IO#advise method provides hints to the kernel about expected access patterns. These hints influence page cache behavior and, indirectly, I/O scheduling. The :sequential hint tells the kernel to expect sequential access, triggering aggressive read-ahead and immediate page reclaim. The :random hint disables read-ahead since it would waste I/O bandwidth.

# Provide access pattern hints to the kernel
File.open('large_dataset.bin', 'rb') do |file|
  file.advise(:sequential)  # Optimize for sequential reading
  
  # Kernel now performs aggressive read-ahead
  # I/O scheduler sees many sequential requests it can merge
  data = file.read
end

File.open('database.idx', 'rb') do |file|
  file.advise(:random)  # Disable read-ahead for random access
  
  # Read index entries at random positions
  indices = [1000, 5000, 2000, 8000]
  indices.each do |pos|
    file.seek(pos)
    entry = file.read(256)
  end
end

Ruby's non-blocking I/O support through io/nonblock allows applications to issue multiple I/O operations without waiting for completion. This increases the number of concurrent requests visible to the I/O scheduler, providing more optimization opportunities. The scheduler performs better with deeper queues since it can select from more requests when choosing the next operation.

require 'io/nonblock'

# Non-blocking I/O increases scheduler queue depth
files = (0...10).map { |i| File.open("data#{i}.bin", 'rb') }
files.each { |f| f.nonblock = true }

# Issue multiple read requests without blocking
pending_operations = []
files.each do |file|
  begin
    data = file.read_nonblock(4096)
    # Process data
  rescue IO::WaitReadable
    # I/O pending - scheduler will handle it
    pending_operations << file
  end
end

# Wait for pending operations
IO.select(pending_operations) # Block until at least one completes

For applications requiring high-performance I/O with many concurrent operations, gems like nio4r provide event-driven I/O multiplexing. These libraries use select(), poll(), or epoll() to monitor multiple file descriptors, issuing I/O operations asynchronously. This pattern gives the I/O scheduler maximum flexibility since many requests remain outstanding simultaneously.

require 'nio'

# Event-driven I/O with nio4r
selector = NIO::Selector.new

files = (0...20).map { |i| File.open("data#{i}.bin", 'rb') }
files.each do |file|
  monitor = selector.register(file, :r)
  monitor.value = file
end

# Process I/O as it completes
files.length.times do
  selector.select do |monitor|
    file = monitor.value
    data = file.read_nonblock(4096) rescue nil
    if data
      # Process data
    else
      selector.deregister(file)
      file.close
    end
  end
end
# Scheduler optimizes order of these concurrent operations

Database access patterns significantly impact I/O scheduler performance. ORMs like ActiveRecord generate SQL queries that translate to disk I/O. Sequential table scans produce sequential disk access beneficial to any scheduler. Index lookups generate random I/O, challenging for traditional schedulers but acceptable on SSDs.

require 'active_record'

# Sequential scan - scheduler-friendly access pattern
User.where('created_at > ?', 1.year.ago).find_each do |user|
  # Process users sequentially
  # Reads proceed in physical order on disk
end

# Multiple separate queries - may cause random I/O
user_ids = [1, 500, 25, 750, 100]
users = user_ids.map { |id| User.find(id) }
# Each find() might hit different disk regions
# Scheduler must optimize these disparate requests

# Batch loading reduces I/O operations
users = User.where(id: user_ids).to_a
# Single query reduces I/O request count
# Scheduler handles one operation instead of five

Performance Considerations

I/O scheduler performance depends heavily on workload characteristics. Sequential access patterns allow aggressive request merging and read-ahead, achieving transfer rates approaching hardware limits. Random access patterns prevent merging and may cause the head to traverse the entire disk repeatedly, reducing throughput by 100x or more on HDDs.

The ratio of read to write operations affects scheduler behavior. Most schedulers prioritize reads since they block application progress. Writes proceed asynchronously through the page cache, allowing applications to continue without waiting. Under read-heavy workloads, the scheduler may delay writes indefinitely if reads continue arriving. This improves interactive response time but can cause write starvation.

Queue depth dramatically impacts scheduling effectiveness. With only one or two outstanding requests, the scheduler has minimal reordering opportunities. As queue depth increases to 32-128 requests, the scheduler selects from many candidates, minimizing seek distance. Beyond 128-256 requests, gains plateau while per-request latency increases due to queuing delay.

require 'benchmark'

# Demonstrate impact of access patterns on performance
def sequential_access(file, size)
  # Read file sequentially
  file.rewind
  chunks = size / 4096
  chunks.times { file.read(4096) }
end

def random_access(file, size)
  # Read file in random order
  chunks = size / 4096
  offsets = (0...chunks).map { |i| i * 4096 }.shuffle
  offsets.each do |offset|
    file.seek(offset)
    file.read(4096)
  end
end

# Create test file
size = 100 * 1024 * 1024  # 100MB
File.open('test.dat', 'wb') { |f| f.write('x' * size) }

File.open('test.dat', 'rb') do |file|
  Benchmark.bm(20) do |bm|
    bm.report('Sequential read:') { sequential_access(file, size) }
    bm.report('Random read:') { random_access(file, size) }
  end
end
# On HDD: Random read 50-100x slower than sequential
# On SSD: Random read 1-5x slower (or comparable) to sequential

Request size influences performance through interaction with both hardware and schedulers. Extremely small requests (512 bytes or less) incur overhead from syscall and scheduling costs disproportionate to data transferred. Very large requests (megabytes) may be split by the kernel into multiple block-level operations, reducing scheduler flexibility. Optimal request sizes typically range from 4KB to 128KB depending on workload.

The Linux CFQ scheduler adjusts behavior based on process I/O patterns. Processes issuing many requests receive longer time slices, while interactive processes with occasional I/O receive immediate service. This heuristic distinguishes between batch processing (throughput-oriented) and interactive applications (latency-sensitive), applying different scheduling strategies to each.

Storage device characteristics fundamentally alter scheduling trade-offs. HDDs benefit greatly from traditional scheduling algorithms. A SCAN-based scheduler might improve HDD throughput by 3-5x compared to FCFS under random workloads. SSDs show minimal benefit from classical algorithms since seek time disappears. SSD performance depends more on internal parallelism, with the NOOP scheduler often outperforming CFQ or Deadline.

# Detect storage device type and adjust I/O strategy
def detect_storage_type(path)
  # Simplistic detection - real systems check /sys/block/*/queue/rotational
  stat = File.stat(path)
  device = stat.dev
  
  # On Linux, check if device is rotational
  rotational_file = "/sys/dev/block/#{device >> 8}:#{device & 0xff}/queue/rotational"
  if File.exist?(rotational_file)
    File.read(rotational_file).strip == "1" ? :hdd : :ssd
  else
    :unknown
  end
end

def optimize_for_storage(file, storage_type)
  case storage_type
  when :hdd
    # Sequential access and larger requests for HDDs
    file.advise(:sequential)
    chunk_size = 128 * 1024  # 128KB chunks
  when :ssd
    # SSDs handle random access well
    file.advise(:random)
    chunk_size = 4096  # Smaller chunks OK
  else
    # Conservative default
    chunk_size = 16 * 1024
  end
  
  chunk_size
end

Write-back caching complicates performance analysis. When an application writes data, the write typically returns after data reaches the page cache, not physical storage. Actual disk I/O occurs later when the kernel's flusher threads write dirty pages to disk. This delayed write batches many small writes into fewer larger operations, improving throughput but making timing measurements misleading.

Database systems often use direct I/O to bypass page cache, giving applications control over caching decisions. Ruby applications can open files with the O_DIRECT flag (platform-specific) to enable direct I/O. This forces each operation to interact directly with the I/O scheduler, making scheduler behavior more predictable but requiring larger, aligned requests.

Concurrency level affects I/O performance through scheduler queue depth. A single-threaded application issues one request at a time, limiting scheduler optimization. Multi-threaded applications or async I/O increase queue depth proportionally to the number of concurrent operations. On HDDs, 4-8 concurrent operations often provide optimal throughput. SSDs benefit from higher concurrency, sometimes requiring 32+ concurrent operations to saturate available parallelism.

Practical Examples

Understanding scheduler behavior through practical examples demonstrates the performance impact of different access patterns and scheduling algorithms.

Example 1: Sequential vs Random Access Performance

# Create a 1GB test file
FILE_SIZE = 1024 * 1024 * 1024
BLOCK_SIZE = 4096
BLOCK_COUNT = FILE_SIZE / BLOCK_SIZE

File.open('testfile.bin', 'wb') do |file|
  BLOCK_COUNT.times { file.write('x' * BLOCK_SIZE) }
end

# Measure sequential read performance
def measure_sequential
  start = Time.now
  File.open('testfile.bin', 'rb') do |file|
    file.advise(:sequential)
    while data = file.read(BLOCK_SIZE)
      # Process block
    end
  end
  elapsed = Time.now - start
  throughput = FILE_SIZE / elapsed / (1024 * 1024)
  { time: elapsed, throughput_mb_s: throughput }
end

# Measure random read performance
def measure_random
  offsets = (0...1000).map { rand(BLOCK_COUNT) * BLOCK_SIZE }
  
  start = Time.now
  File.open('testfile.bin', 'rb') do |file|
    file.advise(:random)
    offsets.each do |offset|
      file.seek(offset)
      file.read(BLOCK_SIZE)
    end
  end
  elapsed = Time.now - start
  throughput = (1000 * BLOCK_SIZE) / elapsed / (1024 * 1024)
  { time: elapsed, throughput_mb_s: throughput }
end

seq = measure_sequential
rand = measure_random

puts "Sequential: #{seq[:throughput_mb_s].round(2)} MB/s"
puts "Random: #{rand[:throughput_mb_s].round(2)} MB/s"
puts "Performance ratio: #{(seq[:throughput_mb_s] / rand[:throughput_mb_s]).round(1)}x"
# On HDD: Ratio typically 50-100x
# On SSD: Ratio typically 1-3x

This example illustrates how scheduler effectiveness varies with access patterns. Sequential access allows the scheduler to merge adjacent requests and maximize transfer rates. Random access forces the scheduler to choose between many distant requests, resulting in extensive head movement on HDDs.

Example 2: Impact of Request Batching

# Database-like access pattern - individual queries vs batch loading
class DataStore
  def initialize(filename)
    @file = File.open(filename, 'rb')
    @record_size = 256
  end
  
  def read_record(id)
    offset = id * @record_size
    @file.seek(offset)
    @file.read(@record_size)
  end
  
  def read_records_batch(ids)
    # Sort IDs to create more sequential access pattern
    ids.sort.map { |id| read_record(id) }
  end
  
  def close
    @file.close
  end
end

# Create test datastore
record_count = 100000
File.open('datastore.bin', 'wb') do |f|
  record_count.times { |i| f.write(i.to_s.rjust(256, '0')) }
end

store = DataStore.new('datastore.bin')

# Random individual queries - poor for scheduler
random_ids = (0...1000).map { rand(record_count) }
start = Time.now
random_ids.each { |id| store.read_record(id) }
individual_time = Time.now - start

# Batch query with sorting - helps scheduler
start = Time.now
store.read_records_batch(random_ids)
batch_time = Time.now - start

store.close

puts "Individual queries: #{individual_time.round(3)}s"
puts "Batch with sorting: #{batch_time.round(3)}s"
puts "Speedup: #{(individual_time / batch_time).round(2)}x"
# Sorting IDs reduces seek distance, improving scheduler efficiency

Sorting requests by physical location helps any scheduler algorithm. Even with random logical access patterns, sorting creates more sequential physical access. This technique applies to database queries, file processing, and any scenario with multiple small operations.

Example 3: Write Buffering and Flush Behavior

# Demonstrate write buffering impact on performance
require 'benchmark'

WRITE_COUNT = 10000
DATA_SIZE = 1024  # 1KB per write

Benchmark.bm(25) do |bm|
  # Fully buffered writes
  bm.report('Buffered (default):') do
    File.open('buffered.dat', 'wb') do |f|
      WRITE_COUNT.times { f.write('x' * DATA_SIZE) }
    end
  end
  
  # Flush after each write (force to page cache)
  bm.report('Flush per write:') do
    File.open('flushed.dat', 'wb') do |f|
      WRITE_COUNT.times do
        f.write('x' * DATA_SIZE)
        f.flush
      end
    end
  end
  
  # fsync after each write (force to disk)
  bm.report('Fsync per write:') do
    File.open('synced.dat', 'wb') do |f|
      WRITE_COUNT.times do
        f.write('x' * DATA_SIZE)
        f.flush
        f.fsync
      end
    end
  end
  
  # Batch writes, then fsync once
  bm.report('Batch then fsync:') do
    File.open('batch.dat', 'wb') do |f|
      WRITE_COUNT.times { f.write('x' * DATA_SIZE) }
      f.flush
      f.fsync
    end
  end
end

# Results typically show:
# - Buffered: fast (data stays in memory)
# - Flush: moderate (syscalls but page cache absorbs writes)
# - Fsync per write: very slow (each write waits for disk)
# - Batch then fsync: moderate (many writes, one sync wait)

This example demonstrates the cost of forcing data to storage versus allowing buffering. Applications requiring durability must fsync, but batching many operations before syncing dramatically improves performance by allowing the I/O scheduler to optimize multiple pending writes.

Example 4: Concurrent I/O and Queue Depth

require 'thread'

# Process multiple files concurrently to increase scheduler queue depth
def process_files_sequential(filenames)
  filenames.each do |filename|
    File.open(filename, 'rb') do |f|
      while data = f.read(4096)
        # Process data
        sleep(0.0001)  # Simulate processing time
      end
    end
  end
end

def process_files_concurrent(filenames, thread_count = 4)
  queue = Queue.new
  filenames.each { |f| queue << f }
  
  threads = thread_count.times.map do
    Thread.new do
      while filename = queue.pop(true) rescue nil
        File.open(filename, 'rb') do |f|
          while data = f.read(4096)
            sleep(0.0001)
          end
        end
      end
    end
  end
  
  threads.each(&:join)
end

# Create test files
20.times do |i|
  File.open("testfile_#{i}.bin", 'wb') do |f|
    f.write('x' * (1024 * 1024 * 10))  # 10MB each
  end
end

filenames = (0...20).map { |i| "testfile_#{i}.bin" }

# Compare sequential vs concurrent processing
require 'benchmark'
Benchmark.bm(20) do |bm|
  bm.report('Sequential:') { process_files_sequential(filenames) }
  bm.report('Concurrent (4 threads):') { process_files_concurrent(filenames, 4) }
end

# Concurrent version provides deeper scheduler queue
# Scheduler can optimize across multiple active files
# Performance gain depends on storage type (HDD benefits more)

Concurrent I/O increases the number of outstanding requests visible to the scheduler. With multiple threads or async I/O, the scheduler sees requests across different files and can optimize accordingly. This technique particularly benefits HDDs where the scheduler can minimize head movement across diverse requests.

Common Pitfalls

Assuming Sequential File Processing Equals Sequential Disk Access

File system layout determines physical disk locations of file data. A file's logical structure may span non-contiguous disk regions due to fragmentation. Processing a file sequentially in application code doesn't guarantee sequential disk access. The scheduler receives requests for scattered disk locations, limiting optimization opportunities.

# Sequential file reading doesn't guarantee sequential disk access
File.open('potentially_fragmented.log', 'rb') do |file|
  file.advise(:sequential)  # Hint may not help if file fragmented
  
  # Logical sequential access
  while line = file.gets
    process(line)
  end
end

# File may occupy non-contiguous blocks on disk:
# Blocks: [150, 151, 152, 890, 891, 1024, 1025, ...]
# Scheduler sees jumps between distant cylinders

Mitigation involves defragmentation (filesystems like XFS do this automatically) or ensuring files are written in large contiguous operations. Applications writing log files or database files should write large chunks rather than many small appends to increase likelihood of contiguous allocation.

Excessive fsync Causing Performance Collapse

Calling fsync after every write operation forces synchronous disk I/O, bypassing both Ruby's buffers and the page cache. This prevents the I/O scheduler from optimizing across multiple operations, reducing throughput by 100x or more. Each fsync call blocks until the scheduler completes all pending writes for the file.

# Catastrophic performance - fsync per record
def write_records_slow(records)
  File.open('output.dat', 'wb') do |file|
    records.each do |record|
      file.write(record)
      file.flush
      file.fsync  # Blocks until disk completes write
    end
  end
end

# Much better - batch then fsync
def write_records_fast(records)
  File.open('output.dat', 'wb') do |file|
    records.each { |record| file.write(record) }
    file.flush
    file.fsync  # Single fsync after all writes
  end
end

# write_records_fast is 50-200x faster depending on storage

Applications requiring durability should batch multiple operations then fsync periodically (e.g., every 100 records or every 100ms). This provides reasonable durability guarantees while allowing the scheduler to optimize batched writes.

Ignoring Storage Device Type

Applying HDD-optimized access patterns to SSDs wastes development effort and may reduce performance. SSDs handle random I/O efficiently, making complex sorting or sequential access patterns unnecessary. Conversely, treating HDDs like SSDs by issuing random I/O creates severe performance problems.

# Detecting storage type allows appropriate optimization
def optimize_data_processing(file, data_indices)
  device_type = detect_storage_type(file.path)
  
  case device_type
  when :hdd
    # Sort indices to create sequential access pattern
    sorted_indices = data_indices.sort
    sorted_indices.each do |idx|
      file.seek(idx * RECORD_SIZE)
      process(file.read(RECORD_SIZE))
    end
    
  when :ssd
    # Random access fine on SSD - no need to sort
    data_indices.each do |idx|
      file.seek(idx * RECORD_SIZE)
      process(file.read(RECORD_SIZE))
    end
  end
end

Modern applications should detect storage characteristics and adjust behavior accordingly. Cloud environments complicate this since VMs may not know underlying storage types. When in doubt, conservative strategies (larger requests, some batching) work reasonably across storage types.

Small Random Writes in Database Applications

ORMs and database libraries often generate many small queries that translate to random disk I/O. Updating 100 different records via 100 separate UPDATE statements causes the database to write to 100 different disk locations. The I/O scheduler struggles to optimize these scattered writes.

# Poor pattern - many small updates
users_to_update.each do |user|
  user.update(last_seen: Time.now)
  # Each update likely writes to different disk location
  # Scheduler sees 100 unrelated write requests
end

# Better - batch update in single transaction
User.transaction do
  users_to_update.each do |user|
    user.last_seen = Time.now
  end
  users_to_update.each(&:save)
end
# Database can optimize write order
# Scheduler receives more organized write pattern

# Best - single SQL statement
User.where(id: users_to_update.map(&:id))
    .update_all(last_seen: Time.now)
# Minimal disk I/O, scheduler handles efficiently

Batch operations allow databases to optimize write patterns and give the I/O scheduler larger, more organized sets of requests to optimize. This applies to INSERT, UPDATE, and DELETE operations.

Read-Ahead Defeating on Repeated File Access

Opening and closing a file between reads resets kernel read-ahead state. The I/O scheduler and page cache work together to predict future access and prefetch data. Repeatedly opening the same file prevents this optimization, forcing repeated I/O for data that could remain cached.

# Inefficient - repeated open/close
def process_log_entries(log_file, entry_ids)
  entry_ids.each do |entry_id|
    File.open(log_file, 'rb') do |f|
      f.seek(entry_id * ENTRY_SIZE)
      entry = f.read(ENTRY_SIZE)
      process(entry)
    end
    # File closed, read-ahead state lost
  end
end

# Efficient - keep file open
def process_log_entries_optimized(log_file, entry_ids)
  File.open(log_file, 'rb') do |f|
    entry_ids.each do |entry_id|
      f.seek(entry_id * ENTRY_SIZE)
      entry = f.read(ENTRY_SIZE)
      process(entry)
    end
  end
  # Read-ahead remains active, page cache effective
end

Keep files open across multiple operations when possible. The kernel maintains per-file state including read-ahead information and page cache associations. Closing and reopening discards this state, reducing both cache effectiveness and scheduler optimization.

Reference

I/O Scheduling Algorithm Comparison

Algorithm	Seek Optimization	Fairness	Starvation Risk	Best Use Case
FCFS	None	Perfect	None	Light load, sequential access
SSTF	Excellent	Poor	High	High throughput, short-lived jobs
SCAN	Good	Good	None	General purpose, mixed workload
C-SCAN	Good	Better	None	Real-time requirements
LOOK	Very Good	Good	None	Modern general purpose
C-LOOK	Very Good	Better	None	Modern real-time
CFQ	Good	Excellent	Very Low	Interactive multi-process systems
Deadline	Good	Good	None	Latency-sensitive workloads
NOOP	None	Perfect	None	SSDs, RAID arrays

Performance Characteristics by Storage Type

Metric	HDD	SSD
Sequential Read	100-200 MB/s	500-3500 MB/s
Random Read IOPS	75-150	10,000-500,000
Sequential Write	100-200 MB/s	400-3000 MB/s
Random Write IOPS	75-150	10,000-300,000
Seek Time	3-15 ms	<0.1 ms
Queue Depth Benefit	High	Moderate
Scheduler Impact	Critical	Minimal

Ruby I/O Methods and Scheduler Interaction

Method	Buffer Behavior	Scheduler Impact	Use Case
File.read	Single read syscall	Minimal queue depth	Small files
File.read with block	Multiple read syscalls	Better queue depth	Large files
File.write	Ruby buffered	Batched to OS	General writing
file.sync = true	Unbuffered per write	Each write visible to scheduler	Low-latency logging
file.flush	Forces syscalls	Writes reach page cache	Visibility without durability
file.fsync	Forces disk write	Scheduler must complete operation	Durability required
file.advise	Kernel hint	Affects read-ahead scheduling	Large file processing
IO.select	Async operation	Increases queue depth	High-concurrency I/O

Key Timing Components

Component	Typical Range (HDD)	Typical Range (SSD)
Seek Time	3-15 ms	<0.1 ms
Rotational Latency	2-8 ms	None
Transfer Time (4KB)	0.1-0.3 ms	0.01-0.05 ms
Syscall Overhead	0.001-0.01 ms	0.001-0.01 ms
Total Latency (Random)	5-23 ms	0.05-0.2 ms

Optimization Guidelines by Workload

Workload Pattern	HDD Strategy	SSD Strategy
Sequential Read	Large reads, read-ahead enabled	Moderate reads, read-ahead optional
Random Read	Sort requests, batch reads	Direct access, concurrent reads
Sequential Write	Large buffered writes, periodic fsync	Moderate writes, less frequent fsync
Random Write	Batch and sort writes, delayed fsync	Direct writes, reasonable fsync frequency
Mixed Read/Write	Separate read and write phases	Concurrent operations acceptable
Database Queries	Batch queries, sort by key	Batch queries for other reasons
Log Writing	Buffer heavily, fsync per batch	Moderate buffering, fsync per batch

Linux I/O Scheduler Selection

Scheduler	Sysfs Path	Configuration
View Current	/sys/block/sda/queue/scheduler	cat to view, shows active in brackets
Change Temporarily	/sys/block/sda/queue/scheduler	echo cfq > file
Change Permanently	/etc/default/grub	GRUB_CMDLINE_LINUX elevator=deadline
Per-Process Priority	ionice command	ionice -c 2 -n 0 command
Check Rotational	/sys/block/sda/queue/rotational	1 for HDD, 0 for SSD

Ruby I/O Tuning Parameters

Parameter	Setting Method	Impact on Scheduler
Buffer Size	File.open with buffer size arg	Larger buffers reduce syscall frequency
Sync Mode	file.sync = true/false	Disabling buffering increases queue depth
Advise Sequential	file.advise(:sequential)	Triggers read-ahead, more sequential requests
Advise Random	file.advise(:random)	Disables read-ahead, reduces wasted I/O
Non-blocking	require 'io/nonblock', file.nonblock = true	Allows concurrent I/O operations
Direct I/O	Platform-specific flags	Bypasses page cache, direct scheduler interaction

I/O Scheduling