CrackedRuby CrackedRuby

Overview

File system performance encompasses the speed and efficiency of operations that read from and write to persistent storage. These operations form a critical bottleneck in many applications because disk access operates orders of magnitude slower than memory access. A typical hard disk seek time ranges from 5-15 milliseconds, while RAM access completes in nanoseconds. Even modern SSDs, despite eliminating mechanical seek time, still require microseconds for access—thousands of times slower than RAM.

The performance characteristics of file operations affect applications across domains. Database systems rely on optimized file I/O for transaction logs and data files. Web servers must efficiently serve static assets. Data processing pipelines depend on high-throughput file operations to handle large datasets. Build systems perform thousands of file reads during compilation. Backup utilities require maximum throughput for large file transfers.

File system performance optimization operates at multiple levels: hardware characteristics (HDD vs SSD vs NVMe), operating system I/O schedulers and caching, file system implementations (ext4, XFS, APFS), application-level buffering strategies, and access pattern optimization. Each level presents opportunities for performance improvements.

Ruby applications interact with file systems through the File and IO classes, which provide abstractions over operating system I/O calls. The Ruby interpreter adds its own buffering layer on top of OS buffering, creating multiple caching levels. Understanding these layers helps developers write performant file I/O code.

Key Principles

Buffering reduces system call overhead by batching small operations into larger chunks. Each system call involves context switching between user space and kernel space, which incurs overhead. Reading one byte at a time makes 1,000 system calls for a 1KB file. Buffering reads the entire file in one call, reducing overhead by three orders of magnitude. Ruby's IO class maintains internal buffers that default to 8KB, reducing the number of underlying read/write system calls.

Sequential access patterns outperform random access because storage devices optimize for sequential operations. Hard drives eliminate seek time when reading consecutive blocks. SSDs achieve higher throughput with sequential reads through internal parallelism. The operating system's read-ahead mechanism pre-fetches data for sequential patterns but provides no benefit for random access. Applications that process files linearly achieve significantly better performance than those jumping between file positions.

Operating system page cache serves as a transparent performance layer by keeping recently accessed file data in RAM. The kernel maintains this cache using available memory, evicting old pages when memory pressure increases. When an application reads a file, the OS serves subsequent reads from cache without touching disk. Write operations go to the page cache first, with the OS flushing dirty pages to disk asynchronously. This mechanism makes repeatedly accessed files perform at RAM speed.

I/O scheduling algorithms in the kernel optimize disk access patterns by reordering and merging requests. The elevator algorithm services requests in disk sector order, minimizing seek time. Deadline schedulers prevent request starvation by setting time limits. CFQ (Completely Fair Queuing) allocates bandwidth fairly across processes. These schedulers operate transparently but affect application performance, especially under concurrent I/O load.

File system metadata operations carry distinct performance costs separate from data operations. Creating, deleting, or stat-ing files requires updating directory entries, inodes, and file system metadata structures. These operations often require synchronous disk writes for consistency. Applications creating thousands of temporary files experience metadata bottlenecks even with small file sizes. Modern file systems like XFS optimize metadata operations through techniques like delayed allocation.

Direct I/O bypasses the page cache for applications that manage their own caching. Database systems often use direct I/O to prevent double-buffering—once in the page cache and again in the database's buffer pool. Direct I/O requires aligned memory buffers and imposes alignment restrictions but provides predictable performance without cache eviction effects. The trade-off involves losing the kernel's sophisticated caching algorithms.

Memory-mapped files enable treating file contents as memory by mapping file regions into the process address space. The OS handles paging data between RAM and disk automatically. This approach eliminates explicit read/write calls and buffering, relying instead on page faults and the page cache. Memory mapping excels for random access patterns and file modifications but introduces complexity around synchronization and error handling.

Write-ahead logging separates throughput from latency by making durability asynchronous. Applications write to fast sequential logs first, then acknowledge operations before applying changes to primary data structures. The log provides crash recovery. This pattern appears in databases (PostgreSQL's WAL), file systems (ext4's journal), and application-level implementations. The trade-off involves recovery complexity and storage overhead for the log.

Ruby Implementation

Ruby provides file system access through the File class, which inherits from IO. The IO class implements buffering internally, maintaining read and write buffers that reduce system call frequency. Buffer size defaults to 8KB but can be configured through various methods.

# Standard buffered reading
File.open('data.txt', 'r') do |file|
  file.each_line do |line|
    process(line)
  end
end
# Ruby maintains 8KB buffer, reads ahead from disk

The each_line method reads data into internal buffers and returns lines incrementally. This approach balances memory usage with I/O efficiency. For files smaller than the buffer size, Ruby performs a single read system call.

# Reading entire file into memory
content = File.read('data.txt')
# Single read operation for small files
# Multiple buffered reads for large files

# Binary reading with explicit buffer size
File.open('large.bin', 'rb') do |file|
  while chunk = file.read(65536)
    process_binary(chunk)
  end
end
# Reads 64KB chunks, good for large binary files

The read method with a size argument controls buffer granularity. Larger chunks reduce system call overhead but increase memory usage. Binary mode ('rb') disables text encoding conversion and newline translation, improving performance for non-text data.

# Buffered writing
File.open('output.txt', 'w') do |file|
  1000.times do |i|
    file.puts "Line #{i}"
  end
end
# Writes buffered in memory, flushed automatically at close

# Explicit flushing for durability
File.open('critical.log', 'a') do |file|
  file.puts log_entry
  file.flush  # Force OS to write buffer
  file.fsync  # Force OS to flush to disk
end

The flush method writes Ruby's buffer to the operating system, while fsync ensures the OS flushes data to physical storage. Database applications and logging systems use fsync for durability guarantees, accepting the performance cost. Each fsync call blocks until the disk completes the write operation.

# Memory-mapped files using external gem
require 'mmap'

mmap = Mmap.new('data.dat', 'r')
# Access file as byte array
byte_at_1000 = mmap[1000]
# OS handles paging automatically

mmap = Mmap.new('shared.dat', 'rw')
mmap[0, 10] = 'new data'
# Modifies file through memory operations
mmap.munmap

The mmap gem provides memory mapping functionality. Random access to mapped files performs efficiently because the OS handles caching through its page management. Multiple processes can map the same file for shared memory communication.

# Directory iteration with low memory overhead
Dir.foreach('/var/log') do |filename|
  next if filename == '.' || filename == '..'
  path = File.join('/var/log', filename)
  stat = File.stat(path)
  puts "#{filename}: #{stat.size} bytes"
end
# Efficient iteration, one entry at a time

Directory operations involve metadata reads. The stat system call retrieves file information including size, permissions, and timestamps. Calling stat on thousands of files becomes a bottleneck. Modern file systems cache metadata, but the cache has limits.

# Optimized batch file reading
def read_files_efficiently(pattern)
  Dir.glob(pattern).each do |path|
    # Pre-sort by size to optimize cache usage
    next unless File.size(path) < 1_000_000
    
    content = File.read(path)
    process(content)
  end
end

# Sequential processing optimizes OS read-ahead
files = Dir.glob('logs/*.log').sort
files.each do |path|
  File.open(path, 'r') do |file|
    file.each_line do |line|
      analyze(line)
    end
  end
end

Processing files in sorted order improves cache locality. The OS read-ahead mechanism works best with predictable access patterns. Checking file size before reading prevents loading unexpectedly large files into memory.

# Lazy enumeration for large files
File.open('huge.log', 'r') do |file|
  file.lazy.each_line.grep(/ERROR/).first(10).each do |line|
    puts line
  end
end
# Stops reading after finding 10 matches
# Does not load entire file into memory

The lazy enumerator prevents materializing the entire file in memory. This pattern excels for early-exit scenarios where processing stops after finding specific data. Regular each_line processes the entire file regardless of early termination.

Performance Considerations

Sequential vs random access demonstrates dramatic performance differences. Sequential reads on hard drives achieve 100-200 MB/s, while random reads drop to 1-2 MB/s due to seek time overhead. SSDs reduce this gap but still favor sequential access—500-550 MB/s sequential versus 300-400 MB/s random. Applications should structure data for sequential access when performance matters.

# Poor: Random access pattern
file = File.open('data.bin', 'rb')
offsets = [1000, 500, 5000, 2000, 100]
offsets.each do |offset|
  file.seek(offset)
  data = file.read(100)
  process(data)
end
# Each seek causes disk repositioning

# Better: Sequential access after sorting
file = File.open('data.bin', 'rb')
offsets.sort.each do |offset|
  file.seek(offset)
  data = file.read(100)
  process(data)
end
# Sorted seeks minimize disk head movement

Buffer size tuning balances memory usage against system call overhead. Small buffers increase system call frequency. Large buffers waste memory and reduce cache effectiveness. The optimal size depends on file size distribution and available memory. For reading multiple small files, keep buffers small. For processing large files sequentially, use larger buffers.

# Measuring impact of buffer size
require 'benchmark'

def read_with_buffer(path, buffer_size)
  File.open(path, 'rb') do |file|
    while chunk = file.read(buffer_size)
      # Processing
    end
  end
end

Benchmark.bm do |x|
  x.report('4KB:')  { read_with_buffer('large.bin', 4096) }
  x.report('64KB:') { read_with_buffer('large.bin', 65536) }
  x.report('1MB:')  { read_with_buffer('large.bin', 1048576) }
end
# Results show diminishing returns above 64KB-256KB

Write buffering and durability create a trade-off between throughput and data safety. Buffered writes batch small operations but risk data loss on crashes. The fsync system call forces physical writes but reduces throughput significantly. Applications must choose based on durability requirements.

# High-throughput logging without durability guarantees
logger = File.open('app.log', 'a')
logger.sync = false  # Disable auto-flushing
10000.times { logger.puts "Log entry" }
logger.close
# Fast but data loss risk on crash

# Durable logging with performance cost
logger = File.open('critical.log', 'a')
10000.times do
  logger.puts "Transaction committed"
  logger.fsync  # Ensure disk write
end
logger.close
# Slow but guaranteed durability

Operating system caching makes the second read of a file dramatically faster than the first. Applications processing the same files repeatedly benefit enormously from the page cache. The cache operates system-wide, so other processes reading the same files contribute to cache warming.

require 'benchmark'

path = 'test.bin'
File.write(path, 'x' * 10_000_000)

Benchmark.bm do |x|
  x.report('First read:') { File.read(path) }
  x.report('Second read:') { File.read(path) }
end
# Second read completes 10-100x faster from cache

# Clearing cache (Linux)
system('sync; echo 3 > /proc/sys/vm/drop_caches')
Benchmark.bm do |x|
  x.report('After cache clear:') { File.read(path) }
end

Metadata operation costs dominate when working with many small files. Creating, deleting, or stat-ing files involves filesystem metadata updates. These operations often require synchronous disk writes for consistency. Applications creating thousands of temporary files should batch operations or use alternative designs.

# Poor: Creating many small files
10000.times do |i|
  File.write("temp/file_#{i}.txt", "data")
end
# Each write requires directory update

# Better: Single file with offsets
File.open('combined.dat', 'wb') do |file|
  10000.times do |i|
    file.write("data for record #{i}\n")
  end
end
# Single file creation, minimal metadata overhead

Concurrent I/O from multiple threads or processes can improve throughput on modern storage, especially SSDs with high I/O parallelism. However, contention for the same files creates bottlenecks. Operating system locking and cache thrashing reduce benefits. Partitioning data across files enables parallel processing without contention.

# Parallel file processing
require 'parallel'

files = Dir.glob('data/*.txt')
results = Parallel.map(files, in_threads: 4) do |path|
  File.read(path).scan(/pattern/).count
end
# Multiple threads reading different files
# OS can parallelize I/O operations

# Avoiding contention on shared files
threads = 4.times.map do |i|
  Thread.new do
    File.open("output_#{i}.log", 'w') do |file|
      # Each thread writes to separate file
      process_and_log(file, i)
    end
  end
end
threads.each(&:join)

Practical Examples

Large log file analysis requires streaming processing to handle files larger than available RAM. Reading the entire file with File.read causes memory exhaustion. Line-by-line iteration with proper buffering maintains constant memory usage regardless of file size.

def analyze_large_log(path)
  error_count = 0
  warning_count = 0
  
  File.open(path, 'r') do |file|
    file.each_line do |line|
      error_count += 1 if line.include?('ERROR')
      warning_count += 1 if line.include?('WARN')
    end
  end
  
  { errors: error_count, warnings: warning_count }
end

# Processing 10GB log file with constant memory usage
stats = analyze_large_log('/var/log/application.log')
puts "Found #{stats[:errors]} errors, #{stats[:warnings]} warnings"

Batch file export generates multiple output files from processed data. Naive implementations open and close files repeatedly, incurring overhead. Keeping files open during batch processing and buffering writes improves throughput.

# Inefficient: Opening files repeatedly
records.each do |record|
  File.open("output/#{record.category}.csv", 'a') do |file|
    file.puts record.to_csv
  end
end
# File open/close overhead on every record

# Efficient: Keep files open, buffer writes
output_files = {}

begin
  records.each do |record|
    category = record.category
    output_files[category] ||= File.open("output/#{category}.csv", 'w')
    output_files[category].puts record.to_csv
  end
ensure
  output_files.each_value(&:close)
end
# Opens each file once, buffers automatically

Configuration file reloading monitors files for changes and reloads when modified. Checking file modification time on every operation creates excessive stat calls. Caching mtime and checking periodically reduces overhead.

class ConfigLoader
  def initialize(path)
    @path = path
    @mtime = nil
    @config = nil
    @check_interval = 5  # seconds
    @last_check = Time.now - 10
  end
  
  def load
    now = Time.now
    
    # Throttle stat calls
    if now - @last_check > @check_interval
      current_mtime = File.mtime(@path)
      
      if @mtime.nil? || current_mtime > @mtime
        @config = YAML.load_file(@path)
        @mtime = current_mtime
      end
      
      @last_check = now
    end
    
    @config
  end
end

# Usage avoids excessive filesystem checks
config = ConfigLoader.new('app.yml')
loop do
  settings = config.load
  # Uses cached config between check intervals
  process_with_settings(settings)
  sleep 0.1
end

Binary file parsing reads structured binary data efficiently by reading fixed-size chunks and unpacking. This approach minimizes system calls and leverages buffering.

def parse_binary_records(path)
  records = []
  record_size = 64  # bytes per record
  
  File.open(path, 'rb') do |file|
    while chunk = file.read(record_size)
      break if chunk.size < record_size
      
      # Unpack binary data into fields
      id, timestamp, value = chunk.unpack('L>L>d>')
      records << { id: id, time: timestamp, value: value }
    end
  end
  
  records
end

# Processes binary file with minimal overhead
data = parse_binary_records('sensor_data.bin')

Directory tree traversal walks large directory hierarchies efficiently by avoiding redundant stat calls and processing files in batches.

def find_large_files(root, min_size)
  large_files = []
  
  Find.find(root) do |path|
    next unless File.file?(path)
    
    begin
      stat = File.stat(path)
      large_files << [path, stat.size] if stat.size > min_size
    rescue Errno::EACCES
      # Skip permission denied
    end
  end
  
  large_files.sort_by { |_, size| -size }
end

# Scan filesystem once, collect results
files = find_large_files('/home/user', 100_000_000)
files.first(10).each do |path, size|
  puts "#{path}: #{size / 1_000_000}MB"
end

Common Pitfalls

Reading entire files into memory causes out-of-memory errors for large files. Developers often use File.read without considering file size. The solution involves streaming processing or checking size before loading.

# Problematic pattern
def process_file(path)
  content = File.read(path)  # Crashes on 5GB file
  content.lines.each { |line| analyze(line) }
end

# Correct approach
def process_file_safe(path)
  File.open(path, 'r') do |file|
    file.each_line { |line| analyze(line) }
  end
end

Ignoring buffer flushing leads to data loss when processes terminate abnormally. Developers assume writes complete immediately, but buffering delays actual disk writes. Critical data requires explicit flushing.

# Data loss risk
file = File.open('transactions.log', 'a')
file.puts "Transaction: #{tx_id}"
# Process crashes here - transaction not on disk

# Safe pattern
file = File.open('transactions.log', 'a')
file.puts "Transaction: #{tx_id}"
file.flush  # Ensure write to OS
file.fsync  # Ensure write to disk

Excessive fsync calls destroy write performance. Developers sometimes fsync after every write operation, reducing throughput by 100x. Batching writes before syncing balances durability and performance.

# Performance disaster
1000.times do |i|
  File.open('log.txt', 'a') do |file|
    file.puts "Entry #{i}"
    file.fsync  # 1000 fsync calls
  end
end

# Reasonable approach
File.open('log.txt', 'a') do |file|
  1000.times do |i|
    file.puts "Entry #{i}"
  end
  file.fsync  # Single fsync
end

Not handling EAGAIN errors causes failures in non-blocking I/O. Non-blocking file operations return EAGAIN when data is unavailable. Applications must retry or use select/poll mechanisms.

# Missing error handling
file = File.open('/dev/input/event0', File::RDONLY | File::NONBLOCK)
data = file.read(100)  # Raises EAGAIN if no data

# Proper handling
require 'io/wait'

file = File.open('/dev/input/event0', File::RDONLY | File::NONBLOCK)
begin
  data = file.read(100)
rescue IO::WaitReadable
  IO.select([file])
  retry
end

Memory mapping excessive file regions exhausts virtual address space on 32-bit systems. Developers map entire large files without considering address space limits. The solution involves mapping smaller windows or using 64-bit systems.

Forgetting to close files exhausts file descriptors. Each process has a file descriptor limit (typically 1024). Applications processing many files must close them properly or use block form.

# Leaks file descriptors
1000.times do |i|
  file = File.open("data_#{i}.txt", 'r')
  process(file.read)
  # Missing file.close
end
# Eventually raises "Too many open files"

# Correct pattern
1000.times do |i|
  File.open("data_#{i}.txt", 'r') do |file|
    process(file.read)
  end
  # Automatic close at block end
end

Assuming atomic writes leads to partial file reads. File writes are not atomic above filesystem block size (typically 4KB). Readers may see partially written data during concurrent access. Atomic file replacement requires write-then-rename patterns.

# Non-atomic update
File.write('config.json', new_config.to_json)
# Readers may see partial JSON during write

# Atomic update
temp_file = 'config.json.tmp'
File.write(temp_file, new_config.to_json)
File.rename(temp_file, 'config.json')
# Readers see either old or new file, never partial

Tools & Ecosystem

strace traces system calls on Linux, revealing actual I/O patterns and helping identify performance problems. The tool shows every read, write, open, and close call with timing information.

# Trace Ruby script I/O
strace -e trace=open,read,write,close -T ruby script.rb

# Count system calls
strace -c ruby script.rb
# Shows calls/time breakdown by syscall type

# Filter specific file operations
strace -e trace=open,stat ruby script.rb 2>&1 | grep config.yml

iotop monitors per-process I/O bandwidth on Linux, showing which processes consume disk I/O. This tool helps identify I/O bottlenecks in production systems.

# Real-time I/O monitoring
iotop -o  # Show only processes doing I/O

# Batch mode for logging
iotop -b -n 10 > io_usage.txt

ruby-prof profiles Ruby code including I/O operations. The flat and graph profiles show time spent in file operations.

require 'ruby-prof'

RubyProf.start
# Code to profile
process_files('data/*.txt')
result = RubyProf.stop

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)
# Shows time in File.read, File.open, etc.

Benchmark gem measures execution time for comparing different file I/O approaches. The module provides convenient methods for A/B testing performance optimizations.

require 'benchmark'

Benchmark.bmbm do |x|
  x.report('read all:') do
    File.read('large.txt')
  end
  
  x.report('read chunks:') do
    File.open('large.txt', 'r') do |f|
      while chunk = f.read(65536)
        # Process
      end
    end
  end
end

iostat reports system-wide I/O statistics including throughput, IOPS, and latency. The tool helps understand storage device performance and bottlenecks.

# Monitor I/O statistics
iostat -x 1  # Extended stats every second

# Focus on specific device
iostat -x sda 2

inotify (Linux) and FSEvents (macOS) provide efficient file system change notification. These APIs avoid polling by getting event notifications from the kernel. Ruby gems like listen and rb-inotify wrap these mechanisms.

require 'listen'

listener = Listen.to('config/') do |modified, added, removed|
  puts "Modified: #{modified}"
  puts "Added: #{added}"
  puts "Removed: #{removed}"
  reload_configuration
end

listener.start
sleep

mmap gem enables memory-mapped file access in Ruby. Memory mapping provides efficient random access and shared memory between processes.

require 'mmap'

# Read-only mapping
map = Mmap.new('data.bin', 'r')
byte = map[1000]  # Random access
map.munmap

# Read-write mapping with shared access
map = Mmap.new('shared.dat', 'rw', Mmap::MAP_SHARED)
map[0, 4] = [1234].pack('L')  # Write integer
map.msync  # Force sync to disk

disk cache monitoring tools show page cache hit rates and help evaluate caching effectiveness. Commands like vmstat and free show cache usage.

# View cache memory usage
free -h
# Shows buffers and cache size

# Monitor cache hit/miss rates
vmstat 1
# Watch 'bi' (blocks in) and 'bo' (blocks out)

fio (Flexible I/O Tester) benchmarks storage devices with configurable access patterns. The tool generates realistic workloads for capacity planning.

# Random read test
fio --name=random-read --ioengine=libaio --rw=randread \
    --bs=4k --numjobs=4 --size=1g --runtime=60

# Sequential write test
fio --name=seq-write --ioengine=libaio --rw=write \
    --bs=1m --size=10g --runtime=60

Reference

File System Performance Characteristics

Storage Type Sequential Read Random Read Sequential Write Random Write Latency
HDD (7200 RPM) 100-200 MB/s 1-2 MB/s 100-160 MB/s 1-2 MB/s 5-15 ms
SATA SSD 500-550 MB/s 300-400 MB/s 450-520 MB/s 250-350 MB/s 50-150 μs
NVMe SSD 2000-3500 MB/s 1500-3000 MB/s 1500-3000 MB/s 1000-2500 MB/s 10-50 μs
RAM 20000+ MB/s 20000+ MB/s 15000+ MB/s 15000+ MB/s <100 ns

Ruby File I/O Methods

Method Operation Buffering Use Case
File.read Read entire file Automatic Small files only
File.open with block Stream access 8KB default Large files
IO#read(size) Read fixed bytes Custom size Binary files
IO#each_line Line iteration Automatic Text processing
IO#flush Write buffer to OS N/A Before fsync
IO#fsync Force disk write N/A Durability needs
File.write Write entire content Atomic Small files

System Call Performance

System Call Cost (μs) Operation Optimization
read (cached) 1-2 Read from page cache Automatic
read (uncached) 5000-15000 Read from HDD Use SSD
write (buffered) 1-2 Write to page cache Default behavior
fsync 5000-50000 Force physical write Batch updates
stat 1-5 Get file metadata Cache results
open/close 5-20 File handle creation Keep files open

Optimal Buffer Sizes

File Size Buffer Size Rationale
< 100 KB 8-16 KB Minimize memory, default works
100 KB - 10 MB 64-256 KB Balance memory and syscalls
10 MB - 1 GB 256 KB - 2 MB Reduce syscall overhead
> 1 GB 1-4 MB Maximum throughput
Binary formats 64 KB - 1 MB Match record/block size
Network files 16-64 KB Network latency dominates

File Access Pattern Optimization

Pattern Good Practice Poor Practice
Sequential read Read full buffer size Read single bytes
Random read Sort offsets before seeking Seek in arbitrary order
Sequential write Buffer writes, flush at end Flush after each write
Random write Memory map or buffered Direct write per update
Many small files Combine into single file Create thousands of files
Repeated access Keep file open Open and close repeatedly

Durability vs Performance Trade-offs

Approach Throughput Durability Use Case
Buffered writes High Low Logs, temporary data
Flush after batch Medium Medium Application data
fsync per operation Very low High Financial transactions
Journaled filesystem Medium High System files
Write-ahead log Medium-high High Databases
Async replication High Medium Distributed systems

Common Performance Bottlenecks

Bottleneck Symptom Solution
Small reads High syscall overhead Increase buffer size
Excessive fsync Low write throughput Batch before syncing
Random access High latency Sort access pattern
Metadata operations Slow file creation Reduce file count
Memory exhaustion Process crash Stream processing
Cache misses Inconsistent performance Sequential access
File descriptor leak Open file errors Use block form

Profiling Commands

Command Purpose Example Output
strace -c Count system calls calls/time by syscall
strace -T Time each syscall read(3, ...) = 8192 <0.001>
time Overall execution time real 0m2.5s user 0m1.2s sys 0m0.8s
iostat -x Device I/O stats await, util%, throughput
iotop Process I/O bandwidth Per-process MB/s read/write
vmstat Cache hit rates Blocks in/out per second