CrackedRuby CrackedRuby

Overview

Buffering strategies define how data moves between different processing stages by introducing intermediate storage that accumulates data before transferring it. Rather than processing each individual byte or character immediately, buffering collects data in memory and transfers it in larger chunks. This approach reduces the number of system calls, minimizes context switches, and improves overall throughput.

The core problem buffering solves relates to the performance gap between different system layers. CPU operations execute in nanoseconds, memory access takes tens of nanoseconds, but disk I/O requires milliseconds and network operations can take even longer. Without buffering, a program writing individual bytes to disk would make thousands of system calls, each incurring overhead that dwarfs the actual data transfer time.

Operating systems implement buffering at multiple levels. The kernel maintains buffers for file I/O, network sockets manage send and receive buffers, and applications add their own buffering layers. Each layer trades memory consumption for improved performance by batching operations.

Buffering affects correctness in addition to performance. When data sits in a buffer, it exists in an intermediate state where neither the source nor destination reflects the current reality. A log message written to a buffered stream may not appear in the file until the buffer flushes. A database transaction committed in application code may not reach persistent storage if the application crashes before buffer synchronization.

# Without buffering - each character triggers a write system call
File.open('output.txt', 'w') do |f|
  f.sync = true  # Disable buffering
  10_000.times { f.write('x') }
end

# With buffering - writes accumulate and flush in chunks
File.open('output.txt', 'w') do |f|
  10_000.times { f.write('x') }
end  # Buffer flushes on close

The performance difference between these approaches can exceed 100x for small writes. The first example makes 10,000 write system calls while the second batches them into a handful of larger transfers.

Key Principles

Buffering operates on three fundamental strategies that differ in when data moves from the buffer to the destination: unbuffered, line-buffered, and block-buffered. Each strategy balances responsiveness against efficiency based on the expected data patterns and performance requirements.

Unbuffered I/O transfers each operation immediately to the underlying system without accumulation. Every write call directly invokes a system call to transfer data. This approach minimizes latency and eliminates the risk of data loss from unflushed buffers, but pays maximum overhead for each operation. Unbuffered I/O appears in scenarios requiring immediate visibility of data, such as terminal output or time-critical logging.

Line-buffered I/O accumulates data until encountering a newline character, then flushes the entire buffer. This strategy matches human-readable text output where logical units correspond to lines. Standard output typically uses line buffering when connected to a terminal, allowing each line to appear promptly while reducing system call overhead. Line buffering breaks down with binary data or text without regular newlines, potentially accumulating large amounts of data before flushing.

Block-buffered I/O accumulates data until the buffer reaches capacity, regardless of content. The system flushes when the buffer fills or when explicitly requested. This approach maximizes throughput by minimizing system calls and matches well with block-oriented storage devices. File I/O defaults to block buffering because files rarely require immediate visibility of partial writes.

The buffer lifecycle follows a consistent pattern across strategies. Data enters the buffer through write operations. The buffer accumulates data based on its strategy. A flush trigger occurs based on strategy rules, buffer capacity, or explicit requests. The flush operation transfers buffered data to the destination. Error handling must account for failures at both the buffer and destination levels.

Buffer sizing represents a critical trade-off. Larger buffers reduce system call frequency and improve throughput but increase memory consumption and latency. Smaller buffers respond faster and use less memory but sacrifice throughput. The optimal size depends on the data access pattern, underlying device characteristics, and memory constraints.

# Buffer size affects performance characteristics
small_buffer = StringIO.new
large_buffer = StringIO.new

# Small writes favor larger buffers
1000.times do
  small_buffer.write("data")
end

# Large writes see diminishing returns from buffer size
large_data = "x" * 1_000_000
large_buffer.write(large_data)

Buffer coherency becomes critical in multi-layered systems. Application buffers, language runtime buffers, operating system buffers, and device buffers all maintain independent state. Data written to an application buffer exists only in memory until flushed through each layer to persistent storage. A system crash at any point can lose unflushed data.

Synchronization strategies determine when data becomes durable. Synchronous writes block until data reaches persistent storage, guaranteeing durability at the cost of performance. Asynchronous writes return immediately after copying to a buffer, providing better performance but risking data loss. The choice depends on whether the application can tolerate losing recent writes.

Ruby Implementation

Ruby's IO class hierarchy provides buffering control through synchronization flags, explicit flush operations, and buffer size configuration. The sync attribute determines whether writes bypass buffering, while flush forces pending data to the operating system. Understanding these mechanisms enables precise control over data visibility and performance trade-offs.

Standard streams demonstrate different default buffering strategies. $stdout uses line buffering when connected to a terminal, automatically flushing on newlines. $stderr operates unbuffered, ensuring error messages appear immediately. Files use block buffering by default with buffer sizes typically 8KB or 16KB depending on the system.

# Examining default buffering behavior
puts "$stdout sync: #{$stdout.sync}"  # => false (buffered)
puts "$stderr sync: #{$stderr.sync}"  # => true (unbuffered)

file = File.open('test.txt', 'w')
puts "File sync: #{file.sync}"  # => false (buffered)
file.close

The sync= method controls buffering at the stream level. Setting sync=true disables buffering, causing each write to immediately invoke a system call. Setting sync=false enables buffering with automatic flush based on the buffer capacity. This binary control applies uniformly to all operations on the stream.

# Controlling synchronization
File.open('output.txt', 'w') do |f|
  f.sync = true  # Disable buffering
  f.write("immediate")  # Writes directly to disk
  
  f.sync = false  # Enable buffering
  f.write("buffered")  # Accumulates in buffer
  f.flush  # Explicit flush required
end

Explicit flushing provides finer control than global sync settings. The flush method transfers buffered data to the operating system without waiting for buffer capacity or stream closure. This approach combines buffering benefits for most operations with immediate visibility when required.

# Selective flushing for important messages
log = File.open('app.log', 'a')

1000.times do |i|
  log.write("Processing item #{i}\n")
  
  # Flush critical messages immediately
  if i % 100 == 0
    log.write("Checkpoint: #{i} items processed\n")
    log.flush
  end
end

log.close

StringIO provides memory-backed buffering for building strings incrementally without immediate output. Data accumulates in memory until retrieval through string or rewind and read operations. This approach avoids intermediate string concatenation overhead while maintaining the IO interface.

# Building formatted output efficiently
buffer = StringIO.new

buffer.puts "Header Section"
buffer.puts "=" * 40

data = [1, 2, 3, 4, 5]
data.each do |item|
  buffer.puts "Item: #{item}"
end

buffer.puts "=" * 40
output = buffer.string
buffer.close

Buffer sizing in Ruby adapts to write patterns but remains opaque to application code. The runtime automatically increases buffer capacity as needed while maintaining the high-level buffering strategy. Applications cannot directly configure buffer size but can influence behavior through write patterns.

# Small frequent writes trigger automatic buffering
File.open('output.txt', 'w') do |f|
  10_000.times do |i|
    f.write("Line #{i}\n")
  end
  # Buffer automatically manages accumulation
end

# Large single writes may bypass buffering
File.open('output.txt', 'w') do |f|
  large_data = "x" * 10_000_000
  f.write(large_data)  # May write directly
end

File position tracking interacts with buffering in non-obvious ways. The pos method reports the logical position reflecting buffered writes, not the actual file position on disk. Reading from a buffered stream advances position through the buffer before triggering new reads from storage.

# Position reflects buffered state
File.open('test.txt', 'w+') do |f|
  f.write("hello")
  puts f.pos  # => 5 (logical position)
  # Data may still be in buffer, not on disk
  
  f.flush
  # Now position and disk state match
end

The fsync method provides stronger durability guarantees than flush. While flush transfers data to operating system buffers, fsync blocks until data reaches physical storage. This distinction matters for maintaining consistency across power failures or system crashes.

# Ensuring durability for critical data
File.open('transactions.log', 'a') do |f|
  f.write("TRANSACTION: #{transaction_data}\n")
  f.flush   # Send to OS buffer
  f.fsync   # Force to disk
end

Buffered reading operates independently from writing. The read method fills an internal buffer with data from storage, then returns portions without additional I/O until the buffer empties. The readpartial method bypasses buffering for reading whatever data is immediately available.

# Buffered reading behavior
File.open('large_file.txt', 'r') do |f|
  # First read fills internal buffer
  chunk1 = f.read(100)  # May read more than 100 bytes into buffer
  
  # Subsequent small reads use buffer
  chunk2 = f.read(100)  # Likely served from buffer
  
  # Large read may bypass buffer
  remaining = f.read    # Reads rest of file
end

Implementation Approaches

Buffering strategies span multiple architectural patterns that differ in control granularity, performance characteristics, and complexity. Selection depends on data access patterns, latency requirements, and resource constraints. Each approach represents different trade-offs between throughput, responsiveness, and implementation complexity.

Fixed-size buffering allocates a predetermined buffer capacity and flushes when full. This approach provides predictable memory usage and consistent flush intervals for regular data patterns. Buffer size tuning directly affects performance, requiring benchmarking to identify optimal values for specific workloads.

class FixedBuffer
  def initialize(size, destination)
    @buffer = String.new(capacity: size)
    @size = size
    @destination = destination
  end
  
  def write(data)
    @buffer << data
    flush if @buffer.bytesize >= @size
  end
  
  def flush
    return if @buffer.empty?
    @destination.write(@buffer)
    @buffer.clear
  end
  
  def close
    flush
    @destination.close
  end
end

# Usage
output = File.open('data.bin', 'wb')
buffer = FixedBuffer.new(8192, output)
buffer.write("data" * 1000)
buffer.close

Adaptive buffering dynamically adjusts buffer size based on write patterns and available memory. Small writes trigger buffer growth while sustained large writes may bypass buffering entirely. This approach optimizes for variable workloads but adds complexity in buffer management and memory allocation.

Time-based buffering flushes after a specified interval regardless of buffer occupancy. This strategy bounds latency while maintaining buffering benefits. Applications with real-time requirements combine time-based and size-based triggers to guarantee both throughput and responsiveness.

class TimedBuffer
  def initialize(interval, destination)
    @buffer = StringIO.new
    @interval = interval
    @destination = destination
    @last_flush = Time.now
    start_flush_timer
  end
  
  def write(data)
    @buffer.write(data)
    flush if Time.now - @last_flush >= @interval
  end
  
  def flush
    return if @buffer.size.zero?
    @destination.write(@buffer.string)
    @buffer.reopen("")
    @last_flush = Time.now
  end
  
  private
  
  def start_flush_timer
    Thread.new do
      loop do
        sleep @interval
        flush
      end
    end
  end
end

Ring buffer strategies use fixed-size circular buffers where writes overwrite the oldest data when full. This approach provides bounded memory usage for continuous data streams where recent data matters more than historical data. Ring buffers appear in logging systems that maintain recent entries without unbounded growth.

Double buffering maintains two buffers, writing to one while flushing the other. This technique maximizes throughput by parallelizing buffer accumulation with I/O operations. The complexity increases with synchronization requirements between the writer and flusher threads.

class DoubleBuffer
  def initialize(size, destination)
    @front_buffer = String.new(capacity: size)
    @back_buffer = String.new(capacity: size)
    @size = size
    @destination = destination
    @mutex = Mutex.new
  end
  
  def write(data)
    @mutex.synchronize do
      @front_buffer << data
      
      if @front_buffer.bytesize >= @size
        swap_buffers
        flush_async
      end
    end
  end
  
  private
  
  def swap_buffers
    @front_buffer, @back_buffer = @back_buffer, @front_buffer
  end
  
  def flush_async
    buffer_to_flush = @back_buffer.dup
    Thread.new do
      @destination.write(buffer_to_flush)
    end
    @back_buffer.clear
  end
end

Hierarchical buffering implements multiple buffer layers with different characteristics at each level. An application might use small per-thread buffers that feed into a larger shared buffer before writing to storage. This structure matches well with concurrent systems where coordination overhead dominates single-buffer approaches.

Write-combining buffers detect sequential writes to adjacent locations and merge them into single operations. This optimization reduces overhead for scattered small writes that would otherwise fragment I/O operations. The strategy requires tracking write addresses and detecting merge opportunities.

Performance Considerations

Buffering's performance impact varies dramatically based on write size, frequency, and the underlying storage characteristics. Small synchronous writes represent the worst case scenario where system call overhead dominates execution time. A 4-byte write might take 1-10 microseconds of system call overhead plus 50-100 nanoseconds for the actual data transfer, yielding 99% overhead.

Block device characteristics influence optimal buffer sizes. Traditional hard drives perform best with large sequential writes that amortize seek time across many bytes. Solid state drives reduce random access penalties but still benefit from larger transfers that reduce command overhead. Network protocols introduce additional considerations where buffer size must account for packet overhead and TCP window management.

require 'benchmark'

# Comparing buffering strategies
def benchmark_buffering
  iterations = 100_000
  
  # Unbuffered writes
  unbuffered_time = Benchmark.measure do
    File.open('unbuffered.txt', 'w') do |f|
      f.sync = true
      iterations.times { f.write('x') }
    end
  end
  
  # Buffered writes
  buffered_time = Benchmark.measure do
    File.open('buffered.txt', 'w') do |f|
      iterations.times { f.write('x') }
    end
  end
  
  # Bulk write
  bulk_time = Benchmark.measure do
    File.open('bulk.txt', 'w') do |f|
      f.write('x' * iterations)
    end
  end
  
  puts "Unbuffered: #{unbuffered_time.real}s"
  puts "Buffered: #{buffered_time.real}s"
  puts "Bulk: #{bulk_time.real}s"
end

Memory bandwidth limitations affect buffering at scale. Modern systems achieve 20-100 GB/s memory bandwidth depending on configuration. Buffer copies consume this bandwidth, making zero-copy techniques valuable for high-throughput applications. The sendfile system call exemplifies zero-copy I/O by transferring data directly from one file descriptor to another without intermediate buffering.

CPU cache effects become significant with buffer sizes exceeding cache capacity. L1 cache typically holds 32-64 KB while L2 provides 256 KB to 1 MB. Buffers fitting in L2 cache enable efficient read-modify-write operations. Larger buffers incur cache misses that add 50-200 nanoseconds per access, accumulating into measurable overhead for high-frequency operations.

# Cache-friendly buffer sizing
class CacheAwareBuffer
  # Size chosen to fit in typical L2 cache
  OPTIMAL_SIZE = 256 * 1024
  
  def initialize(destination)
    @buffer = String.new(capacity: OPTIMAL_SIZE)
    @destination = destination
  end
  
  def write(data)
    if data.bytesize > OPTIMAL_SIZE
      # Bypass buffer for very large writes
      flush
      @destination.write(data)
    else
      @buffer << data
      flush if @buffer.bytesize >= OPTIMAL_SIZE
    end
  end
  
  def flush
    return if @buffer.empty?
    @destination.write(@buffer)
    @buffer.clear
  end
end

Flush frequency creates a direct performance trade-off. Frequent flushing reduces latency and memory usage but increases system call overhead. Infrequent flushing maximizes throughput but risks data loss and increases latency for time-sensitive operations. Applications must balance these factors based on requirements.

Thread contention around shared buffers creates scalability bottlenecks. A single buffer accessed by multiple threads requires synchronization that serializes operations. Per-thread buffering eliminates contention but increases memory consumption and complicates buffer management. The optimal approach depends on write frequency and thread count.

# Per-thread buffering to avoid contention
class ThreadLocalBuffer
  def initialize(destination)
    @destination = destination
    @thread_buffers = {}
    @mutex = Mutex.new
  end
  
  def write(data)
    buffer = thread_local_buffer
    buffer << data
    flush_if_needed(buffer)
  end
  
  def flush_all
    @mutex.synchronize do
      @thread_buffers.each_value do |buffer|
        flush_buffer(buffer)
      end
    end
  end
  
  private
  
  def thread_local_buffer
    thread_id = Thread.current.object_id
    @thread_buffers[thread_id] ||= String.new(capacity: 8192)
  end
  
  def flush_if_needed(buffer)
    if buffer.bytesize >= 8192
      flush_buffer(buffer)
      buffer.clear
    end
  end
  
  def flush_buffer(buffer)
    @mutex.synchronize do
      @destination.write(buffer)
    end
  end
end

Write amplification occurs when buffering layers cascade, each adding copies. Application buffer to runtime buffer to kernel buffer to device buffer results in three data copies before reaching storage. Zero-copy techniques and direct I/O minimize amplification by reducing intermediate layers.

Measurement methodology affects performance analysis. Micro-benchmarks using small files in page cache show different characteristics than production workloads accessing large datasets on mechanical drives. Benchmark conditions must match production scenarios including file sizes, access patterns, and system load.

Practical Examples

Log aggregation systems demonstrate buffering at scale where individual log messages arrive continuously from multiple sources. Unbuffered writes would create excessive I/O overhead while pure block buffering risks losing recent messages on crashes. A hybrid approach buffers messages with periodic flushes balances performance and durability.

class LogAggregator
  BUFFER_SIZE = 64 * 1024
  FLUSH_INTERVAL = 5  # seconds
  
  def initialize(log_file)
    @buffer = StringIO.new
    @log_file = File.open(log_file, 'a')
    @mutex = Mutex.new
    @last_flush = Time.now
    start_background_flush
  end
  
  def write_log(level, message)
    timestamp = Time.now.strftime('%Y-%m-%d %H:%M:%S')
    entry = "[#{timestamp}] #{level}: #{message}\n"
    
    @mutex.synchronize do
      @buffer.write(entry)
      
      # Flush on buffer size or critical messages
      if @buffer.size >= BUFFER_SIZE || level == 'ERROR' || level == 'FATAL'
        flush
      end
    end
  end
  
  def flush
    return if @buffer.size.zero?
    
    @log_file.write(@buffer.string)
    @log_file.flush
    @buffer.reopen("")
    @last_flush = Time.now
  end
  
  def close
    flush
    @log_file.close
  end
  
  private
  
  def start_background_flush
    @flush_thread = Thread.new do
      loop do
        sleep FLUSH_INTERVAL
        @mutex.synchronize { flush }
      end
    end
  end
end

# Usage
logger = LogAggregator.new('application.log')
logger.write_log('INFO', 'Application started')
logger.write_log('DEBUG', 'Processing request')
logger.write_log('ERROR', 'Database connection failed')
logger.close

Network response buffering accumulates HTTP response data before sending, reducing packet fragmentation and TCP overhead. Small responses fit entirely in a single buffer flush while large responses stream through multiple buffer cycles.

class BufferedHTTPResponse
  def initialize(socket)
    @socket = socket
    @buffer = String.new(capacity: 16384)
    @headers_sent = false
  end
  
  def write_header(status, headers)
    @buffer << "HTTP/1.1 #{status}\r\n"
    headers.each do |key, value|
      @buffer << "#{key}: #{value}\r\n"
    end
    @buffer << "\r\n"
    @headers_sent = true
  end
  
  def write_body(data)
    raise "Headers not sent" unless @headers_sent
    
    @buffer << data
    
    # Flush when buffer fills or at response end
    flush if @buffer.bytesize >= 16384
  end
  
  def flush
    return if @buffer.empty?
    @socket.write(@buffer)
    @buffer.clear
  end
  
  def close
    flush
    @socket.close
  end
end

# Usage
require 'socket'
server = TCPServer.new(8080)
client = server.accept

response = BufferedHTTPResponse.new(client)
response.write_header('200 OK', {'Content-Type' => 'text/html'})
response.write_body('<html><body>')
response.write_body('<h1>Hello World</h1>')
response.write_body('</body></html>')
response.close

Database batch insertion benefits from buffering by accumulating rows before executing INSERT statements. Individual inserts require separate transactions and index updates while batched inserts amortize overhead across multiple rows.

class BatchInserter
  BATCH_SIZE = 1000
  
  def initialize(db_connection, table)
    @connection = db_connection
    @table = table
    @buffer = []
  end
  
  def insert(record)
    @buffer << record
    flush if @buffer.size >= BATCH_SIZE
  end
  
  def flush
    return if @buffer.empty?
    
    columns = @buffer.first.keys.join(', ')
    placeholders = @buffer.map { |_| "(#{Array.new(@buffer.first.size, '?').join(', ')})" }.join(', ')
    
    sql = "INSERT INTO #{@table} (#{columns}) VALUES #{placeholders}"
    values = @buffer.flat_map(&:values)
    
    @connection.execute(sql, values)
    @buffer.clear
  end
  
  def close
    flush
  end
end

# Usage
inserter = BatchInserter.new(db, 'users')
10_000.times do |i|
  inserter.insert({id: i, name: "user#{i}", email: "user#{i}@example.com"})
end
inserter.close

CSV generation for large datasets demonstrates streaming through buffers to avoid loading entire datasets into memory. Each row writes to a buffer that periodically flushes to the output file.

require 'csv'

class BufferedCSVWriter
  def initialize(filename)
    @file = File.open(filename, 'w')
    @buffer = StringIO.new
    @csv = CSV.new(@buffer)
  end
  
  def write_row(row)
    @csv << row
    
    # Flush buffer when it reaches 1MB
    if @buffer.size >= 1_048_576
      @file.write(@buffer.string)
      @buffer.reopen("")
    end
  end
  
  def close
    # Write remaining buffered data
    @file.write(@buffer.string) unless @buffer.size.zero?
    @file.close
  end
end

# Generate large CSV efficiently
writer = BufferedCSVWriter.new('large_dataset.csv')
writer.write_row(['id', 'name', 'value'])

1_000_000.times do |i|
  writer.write_row([i, "record_#{i}", rand(1000)])
end

writer.close

Common Pitfalls

Forgetting to flush buffers before program termination loses data silently. Buffered writes remain in memory when the process exits, never reaching storage. Applications must explicitly flush or close streams to ensure data persistence.

# WRONG - data lost on exit
def write_logs_wrong
  log = File.open('app.log', 'w')
  log.write("Important message")
  # Process exits, buffer not flushed
end

# CORRECT - explicit close flushes buffer
def write_logs_correct
  File.open('app.log', 'w') do |log|
    log.write("Important message")
  end  # Block ensures close
end

Buffer overflow in fixed-size buffers occurs when accumulating data without bounds checking. The buffer grows beyond allocated capacity, triggering reallocation or crashes. Implementations must either enforce size limits or use dynamic allocation.

# Preventing buffer overflow
class BoundedBuffer
  def initialize(max_size, destination)
    @buffer = String.new
    @max_size = max_size
    @destination = destination
  end
  
  def write(data)
    # Check before adding to prevent overflow
    if @buffer.bytesize + data.bytesize > @max_size
      flush
    end
    
    @buffer << data
  end
  
  def flush
    @destination.write(@buffer)
    @buffer.clear
  end
end

Mixing buffered and unbuffered I/O on the same stream creates ordering issues. Data written through different paths may appear out of order in the output. Ruby's IO streams maintain a single buffer but external system calls bypass it entirely.

# WRONG - mixed I/O creates ordering problems
file = File.open('output.txt', 'w')
file.write("buffered")
file.syswrite("unbuffered")  # Bypasses buffer
file.write("more buffered")
file.close
# File may contain: "unbufferedmore buffereddata"

Assuming flush guarantees durability leads to data loss during system failures. The flush method transfers data to operating system buffers but does not force writes to physical storage. Power failures or kernel crashes can lose data in OS buffers.

# WRONG - flush alone insufficient for durability
File.open('critical.dat', 'w') do |f|
  f.write(important_data)
  f.flush  # Only guarantees data reaches OS
end

# CORRECT - fsync ensures physical storage
File.open('critical.dat', 'w') do |f|
  f.write(important_data)
  f.flush
  f.fsync  # Blocks until data on disk
end

Buffer bloat in network applications occurs when write buffers grow faster than network transmission speed. The buffer accumulates pending data, increasing memory usage and latency. Backpressure mechanisms must limit buffer growth.

class BackpressureBuffer
  MAX_BUFFER_SIZE = 1_048_576  # 1MB
  
  def initialize(socket)
    @socket = socket
    @buffer = String.new
    @mutex = Mutex.new
  end
  
  def write(data)
    @mutex.synchronize do
      # Block if buffer exceeds limit
      wait_for_space while @buffer.bytesize > MAX_BUFFER_SIZE
      
      @buffer << data
      flush_async
    end
  end
  
  private
  
  def wait_for_space
    @mutex.sleep(0.01)  # Wait briefly for flush
  end
  
  def flush_async
    return if @buffer.empty?
    
    data_to_send = @buffer.dup
    @buffer.clear
    
    Thread.new do
      @socket.write(data_to_send)
    end
  end
end

Thread safety violations occur when multiple threads access shared buffers without synchronization. Concurrent writes can corrupt buffer state, resulting in garbled output or crashes.

# WRONG - unsynchronized access
class UnsafeBuffer
  def initialize
    @buffer = String.new
  end
  
  def write(data)
    @buffer << data  # Not thread-safe
  end
end

# CORRECT - synchronized access
class SafeBuffer
  def initialize
    @buffer = String.new
    @mutex = Mutex.new
  end
  
  def write(data)
    @mutex.synchronize do
      @buffer << data
    end
  end
end

Neglecting error handling during flush operations masks I/O failures. Disk full, permission errors, or network failures during flush lose data without notification unless the application checks return values or catches exceptions.

# WRONG - ignoring flush errors
def write_with_no_error_handling
  File.open('output.txt', 'w') do |f|
    f.write(data)
    f.flush  # Errors ignored
  end
end

# CORRECT - handling flush errors
def write_with_error_handling
  begin
    File.open('output.txt', 'w') do |f|
      f.write(data)
      f.flush
    end
  rescue IOError, SystemCallError => e
    Logger.error("Failed to flush data: #{e.message}")
    # Implement retry or alternative handling
  end
end

Reference

Buffering Strategy Comparison

Strategy Flush Trigger Latency Throughput Use Case
Unbuffered Every write Minimum Low Real-time output, error streams
Line-buffered Newline character Low Medium Interactive terminals, log output
Block-buffered Buffer full Higher Maximum File I/O, bulk data processing
Time-based Fixed interval Bounded Medium-High Streaming, continuous data
Adaptive Dynamic conditions Variable High Variable workloads, mixed patterns

Ruby IO Buffer Control

Method Purpose Behavior
sync= Enable/disable buffering true disables buffer, false enables
sync Query buffer state Returns current sync mode
flush Force buffer write Transfers to OS buffer
fsync Synchronize to storage Blocks until physical write
rewind Reset position Returns to start without flushing
close Flush and close Guarantees final flush
syswrite Unbuffered write Bypasses internal buffer
sysread Unbuffered read Bypasses internal buffer

Buffer Size Guidelines

I/O Type Typical Size Consideration
Small writes 4-8 KB Balance overhead vs memory
File I/O 64-256 KB Match filesystem block size
Network I/O 16-64 KB Consider MTU and window size
Database bulk 1-10 MB Balance transaction size
Memory buffer 256 KB Fit in L2 cache
Log aggregation 64-128 KB Balance flush frequency

Performance Characteristics

Operation System Call Overhead Data Transfer Total Time
Single byte unbuffered 1-10 μs 50 ns ~1-10 μs
4KB buffered write 1-10 μs 5 μs ~6-15 μs
64KB buffered write 1-10 μs 80 μs ~81-90 μs
1MB bulk write 1-10 μs 1.25 ms ~1.26 ms

Flush Patterns

Pattern Implementation Trade-off
Eager flush Flush after each write Maximum durability, minimum throughput
Lazy flush Flush on close or buffer full Maximum throughput, data loss risk
Threshold flush Flush at size limit Balanced approach
Periodic flush Flush on timer Bounded latency
Conditional flush Flush on important data Selective durability
Hybrid flush Combine multiple triggers Optimizes multiple goals

Error Scenarios

Error Cause Detection Recovery
Buffer overflow Insufficient capacity Size check or exception Flush and resize
Flush failure Disk full or permission IOError exception Retry or alternate storage
Partial write Interrupted system call Short write count Resume from offset
Sync failure Storage device error SystemCallError Mark data as lost
Corruption Concurrent access Garbled output Add synchronization
Memory exhaustion Unbounded growth Allocation failure Implement backpressure

Thread Safety Patterns

Approach Description Overhead
Global lock Single mutex for buffer High contention
Per-thread buffers Separate buffer per thread Memory overhead
Lock-free queue Atomic operations Complex implementation
Batch synchronization Lock during flush only Balanced approach

Ruby IO Classes

Class Buffering Use Case
File Block buffered File operations
StringIO Memory backed String building
STDOUT Line buffered Terminal output
STDERR Unbuffered Error messages
Socket Block buffered Network communication
Tempfile Block buffered Temporary storage