Overview
Buffering strategies define how data moves between different processing stages by introducing intermediate storage that accumulates data before transferring it. Rather than processing each individual byte or character immediately, buffering collects data in memory and transfers it in larger chunks. This approach reduces the number of system calls, minimizes context switches, and improves overall throughput.
The core problem buffering solves relates to the performance gap between different system layers. CPU operations execute in nanoseconds, memory access takes tens of nanoseconds, but disk I/O requires milliseconds and network operations can take even longer. Without buffering, a program writing individual bytes to disk would make thousands of system calls, each incurring overhead that dwarfs the actual data transfer time.
Operating systems implement buffering at multiple levels. The kernel maintains buffers for file I/O, network sockets manage send and receive buffers, and applications add their own buffering layers. Each layer trades memory consumption for improved performance by batching operations.
Buffering affects correctness in addition to performance. When data sits in a buffer, it exists in an intermediate state where neither the source nor destination reflects the current reality. A log message written to a buffered stream may not appear in the file until the buffer flushes. A database transaction committed in application code may not reach persistent storage if the application crashes before buffer synchronization.
# Without buffering - each character triggers a write system call
File.open('output.txt', 'w') do |f|
f.sync = true # Disable buffering
10_000.times { f.write('x') }
end
# With buffering - writes accumulate and flush in chunks
File.open('output.txt', 'w') do |f|
10_000.times { f.write('x') }
end # Buffer flushes on close
The performance difference between these approaches can exceed 100x for small writes. The first example makes 10,000 write system calls while the second batches them into a handful of larger transfers.
Key Principles
Buffering operates on three fundamental strategies that differ in when data moves from the buffer to the destination: unbuffered, line-buffered, and block-buffered. Each strategy balances responsiveness against efficiency based on the expected data patterns and performance requirements.
Unbuffered I/O transfers each operation immediately to the underlying system without accumulation. Every write call directly invokes a system call to transfer data. This approach minimizes latency and eliminates the risk of data loss from unflushed buffers, but pays maximum overhead for each operation. Unbuffered I/O appears in scenarios requiring immediate visibility of data, such as terminal output or time-critical logging.
Line-buffered I/O accumulates data until encountering a newline character, then flushes the entire buffer. This strategy matches human-readable text output where logical units correspond to lines. Standard output typically uses line buffering when connected to a terminal, allowing each line to appear promptly while reducing system call overhead. Line buffering breaks down with binary data or text without regular newlines, potentially accumulating large amounts of data before flushing.
Block-buffered I/O accumulates data until the buffer reaches capacity, regardless of content. The system flushes when the buffer fills or when explicitly requested. This approach maximizes throughput by minimizing system calls and matches well with block-oriented storage devices. File I/O defaults to block buffering because files rarely require immediate visibility of partial writes.
The buffer lifecycle follows a consistent pattern across strategies. Data enters the buffer through write operations. The buffer accumulates data based on its strategy. A flush trigger occurs based on strategy rules, buffer capacity, or explicit requests. The flush operation transfers buffered data to the destination. Error handling must account for failures at both the buffer and destination levels.
Buffer sizing represents a critical trade-off. Larger buffers reduce system call frequency and improve throughput but increase memory consumption and latency. Smaller buffers respond faster and use less memory but sacrifice throughput. The optimal size depends on the data access pattern, underlying device characteristics, and memory constraints.
# Buffer size affects performance characteristics
small_buffer = StringIO.new
large_buffer = StringIO.new
# Small writes favor larger buffers
1000.times do
small_buffer.write("data")
end
# Large writes see diminishing returns from buffer size
large_data = "x" * 1_000_000
large_buffer.write(large_data)
Buffer coherency becomes critical in multi-layered systems. Application buffers, language runtime buffers, operating system buffers, and device buffers all maintain independent state. Data written to an application buffer exists only in memory until flushed through each layer to persistent storage. A system crash at any point can lose unflushed data.
Synchronization strategies determine when data becomes durable. Synchronous writes block until data reaches persistent storage, guaranteeing durability at the cost of performance. Asynchronous writes return immediately after copying to a buffer, providing better performance but risking data loss. The choice depends on whether the application can tolerate losing recent writes.
Ruby Implementation
Ruby's IO class hierarchy provides buffering control through synchronization flags, explicit flush operations, and buffer size configuration. The sync attribute determines whether writes bypass buffering, while flush forces pending data to the operating system. Understanding these mechanisms enables precise control over data visibility and performance trade-offs.
Standard streams demonstrate different default buffering strategies. $stdout uses line buffering when connected to a terminal, automatically flushing on newlines. $stderr operates unbuffered, ensuring error messages appear immediately. Files use block buffering by default with buffer sizes typically 8KB or 16KB depending on the system.
# Examining default buffering behavior
puts "$stdout sync: #{$stdout.sync}" # => false (buffered)
puts "$stderr sync: #{$stderr.sync}" # => true (unbuffered)
file = File.open('test.txt', 'w')
puts "File sync: #{file.sync}" # => false (buffered)
file.close
The sync= method controls buffering at the stream level. Setting sync=true disables buffering, causing each write to immediately invoke a system call. Setting sync=false enables buffering with automatic flush based on the buffer capacity. This binary control applies uniformly to all operations on the stream.
# Controlling synchronization
File.open('output.txt', 'w') do |f|
f.sync = true # Disable buffering
f.write("immediate") # Writes directly to disk
f.sync = false # Enable buffering
f.write("buffered") # Accumulates in buffer
f.flush # Explicit flush required
end
Explicit flushing provides finer control than global sync settings. The flush method transfers buffered data to the operating system without waiting for buffer capacity or stream closure. This approach combines buffering benefits for most operations with immediate visibility when required.
# Selective flushing for important messages
log = File.open('app.log', 'a')
1000.times do |i|
log.write("Processing item #{i}\n")
# Flush critical messages immediately
if i % 100 == 0
log.write("Checkpoint: #{i} items processed\n")
log.flush
end
end
log.close
StringIO provides memory-backed buffering for building strings incrementally without immediate output. Data accumulates in memory until retrieval through string or rewind and read operations. This approach avoids intermediate string concatenation overhead while maintaining the IO interface.
# Building formatted output efficiently
buffer = StringIO.new
buffer.puts "Header Section"
buffer.puts "=" * 40
data = [1, 2, 3, 4, 5]
data.each do |item|
buffer.puts "Item: #{item}"
end
buffer.puts "=" * 40
output = buffer.string
buffer.close
Buffer sizing in Ruby adapts to write patterns but remains opaque to application code. The runtime automatically increases buffer capacity as needed while maintaining the high-level buffering strategy. Applications cannot directly configure buffer size but can influence behavior through write patterns.
# Small frequent writes trigger automatic buffering
File.open('output.txt', 'w') do |f|
10_000.times do |i|
f.write("Line #{i}\n")
end
# Buffer automatically manages accumulation
end
# Large single writes may bypass buffering
File.open('output.txt', 'w') do |f|
large_data = "x" * 10_000_000
f.write(large_data) # May write directly
end
File position tracking interacts with buffering in non-obvious ways. The pos method reports the logical position reflecting buffered writes, not the actual file position on disk. Reading from a buffered stream advances position through the buffer before triggering new reads from storage.
# Position reflects buffered state
File.open('test.txt', 'w+') do |f|
f.write("hello")
puts f.pos # => 5 (logical position)
# Data may still be in buffer, not on disk
f.flush
# Now position and disk state match
end
The fsync method provides stronger durability guarantees than flush. While flush transfers data to operating system buffers, fsync blocks until data reaches physical storage. This distinction matters for maintaining consistency across power failures or system crashes.
# Ensuring durability for critical data
File.open('transactions.log', 'a') do |f|
f.write("TRANSACTION: #{transaction_data}\n")
f.flush # Send to OS buffer
f.fsync # Force to disk
end
Buffered reading operates independently from writing. The read method fills an internal buffer with data from storage, then returns portions without additional I/O until the buffer empties. The readpartial method bypasses buffering for reading whatever data is immediately available.
# Buffered reading behavior
File.open('large_file.txt', 'r') do |f|
# First read fills internal buffer
chunk1 = f.read(100) # May read more than 100 bytes into buffer
# Subsequent small reads use buffer
chunk2 = f.read(100) # Likely served from buffer
# Large read may bypass buffer
remaining = f.read # Reads rest of file
end
Implementation Approaches
Buffering strategies span multiple architectural patterns that differ in control granularity, performance characteristics, and complexity. Selection depends on data access patterns, latency requirements, and resource constraints. Each approach represents different trade-offs between throughput, responsiveness, and implementation complexity.
Fixed-size buffering allocates a predetermined buffer capacity and flushes when full. This approach provides predictable memory usage and consistent flush intervals for regular data patterns. Buffer size tuning directly affects performance, requiring benchmarking to identify optimal values for specific workloads.
class FixedBuffer
def initialize(size, destination)
@buffer = String.new(capacity: size)
@size = size
@destination = destination
end
def write(data)
@buffer << data
flush if @buffer.bytesize >= @size
end
def flush
return if @buffer.empty?
@destination.write(@buffer)
@buffer.clear
end
def close
flush
@destination.close
end
end
# Usage
output = File.open('data.bin', 'wb')
buffer = FixedBuffer.new(8192, output)
buffer.write("data" * 1000)
buffer.close
Adaptive buffering dynamically adjusts buffer size based on write patterns and available memory. Small writes trigger buffer growth while sustained large writes may bypass buffering entirely. This approach optimizes for variable workloads but adds complexity in buffer management and memory allocation.
Time-based buffering flushes after a specified interval regardless of buffer occupancy. This strategy bounds latency while maintaining buffering benefits. Applications with real-time requirements combine time-based and size-based triggers to guarantee both throughput and responsiveness.
class TimedBuffer
def initialize(interval, destination)
@buffer = StringIO.new
@interval = interval
@destination = destination
@last_flush = Time.now
start_flush_timer
end
def write(data)
@buffer.write(data)
flush if Time.now - @last_flush >= @interval
end
def flush
return if @buffer.size.zero?
@destination.write(@buffer.string)
@buffer.reopen("")
@last_flush = Time.now
end
private
def start_flush_timer
Thread.new do
loop do
sleep @interval
flush
end
end
end
end
Ring buffer strategies use fixed-size circular buffers where writes overwrite the oldest data when full. This approach provides bounded memory usage for continuous data streams where recent data matters more than historical data. Ring buffers appear in logging systems that maintain recent entries without unbounded growth.
Double buffering maintains two buffers, writing to one while flushing the other. This technique maximizes throughput by parallelizing buffer accumulation with I/O operations. The complexity increases with synchronization requirements between the writer and flusher threads.
class DoubleBuffer
def initialize(size, destination)
@front_buffer = String.new(capacity: size)
@back_buffer = String.new(capacity: size)
@size = size
@destination = destination
@mutex = Mutex.new
end
def write(data)
@mutex.synchronize do
@front_buffer << data
if @front_buffer.bytesize >= @size
swap_buffers
flush_async
end
end
end
private
def swap_buffers
@front_buffer, @back_buffer = @back_buffer, @front_buffer
end
def flush_async
buffer_to_flush = @back_buffer.dup
Thread.new do
@destination.write(buffer_to_flush)
end
@back_buffer.clear
end
end
Hierarchical buffering implements multiple buffer layers with different characteristics at each level. An application might use small per-thread buffers that feed into a larger shared buffer before writing to storage. This structure matches well with concurrent systems where coordination overhead dominates single-buffer approaches.
Write-combining buffers detect sequential writes to adjacent locations and merge them into single operations. This optimization reduces overhead for scattered small writes that would otherwise fragment I/O operations. The strategy requires tracking write addresses and detecting merge opportunities.
Performance Considerations
Buffering's performance impact varies dramatically based on write size, frequency, and the underlying storage characteristics. Small synchronous writes represent the worst case scenario where system call overhead dominates execution time. A 4-byte write might take 1-10 microseconds of system call overhead plus 50-100 nanoseconds for the actual data transfer, yielding 99% overhead.
Block device characteristics influence optimal buffer sizes. Traditional hard drives perform best with large sequential writes that amortize seek time across many bytes. Solid state drives reduce random access penalties but still benefit from larger transfers that reduce command overhead. Network protocols introduce additional considerations where buffer size must account for packet overhead and TCP window management.
require 'benchmark'
# Comparing buffering strategies
def benchmark_buffering
iterations = 100_000
# Unbuffered writes
unbuffered_time = Benchmark.measure do
File.open('unbuffered.txt', 'w') do |f|
f.sync = true
iterations.times { f.write('x') }
end
end
# Buffered writes
buffered_time = Benchmark.measure do
File.open('buffered.txt', 'w') do |f|
iterations.times { f.write('x') }
end
end
# Bulk write
bulk_time = Benchmark.measure do
File.open('bulk.txt', 'w') do |f|
f.write('x' * iterations)
end
end
puts "Unbuffered: #{unbuffered_time.real}s"
puts "Buffered: #{buffered_time.real}s"
puts "Bulk: #{bulk_time.real}s"
end
Memory bandwidth limitations affect buffering at scale. Modern systems achieve 20-100 GB/s memory bandwidth depending on configuration. Buffer copies consume this bandwidth, making zero-copy techniques valuable for high-throughput applications. The sendfile system call exemplifies zero-copy I/O by transferring data directly from one file descriptor to another without intermediate buffering.
CPU cache effects become significant with buffer sizes exceeding cache capacity. L1 cache typically holds 32-64 KB while L2 provides 256 KB to 1 MB. Buffers fitting in L2 cache enable efficient read-modify-write operations. Larger buffers incur cache misses that add 50-200 nanoseconds per access, accumulating into measurable overhead for high-frequency operations.
# Cache-friendly buffer sizing
class CacheAwareBuffer
# Size chosen to fit in typical L2 cache
OPTIMAL_SIZE = 256 * 1024
def initialize(destination)
@buffer = String.new(capacity: OPTIMAL_SIZE)
@destination = destination
end
def write(data)
if data.bytesize > OPTIMAL_SIZE
# Bypass buffer for very large writes
flush
@destination.write(data)
else
@buffer << data
flush if @buffer.bytesize >= OPTIMAL_SIZE
end
end
def flush
return if @buffer.empty?
@destination.write(@buffer)
@buffer.clear
end
end
Flush frequency creates a direct performance trade-off. Frequent flushing reduces latency and memory usage but increases system call overhead. Infrequent flushing maximizes throughput but risks data loss and increases latency for time-sensitive operations. Applications must balance these factors based on requirements.
Thread contention around shared buffers creates scalability bottlenecks. A single buffer accessed by multiple threads requires synchronization that serializes operations. Per-thread buffering eliminates contention but increases memory consumption and complicates buffer management. The optimal approach depends on write frequency and thread count.
# Per-thread buffering to avoid contention
class ThreadLocalBuffer
def initialize(destination)
@destination = destination
@thread_buffers = {}
@mutex = Mutex.new
end
def write(data)
buffer = thread_local_buffer
buffer << data
flush_if_needed(buffer)
end
def flush_all
@mutex.synchronize do
@thread_buffers.each_value do |buffer|
flush_buffer(buffer)
end
end
end
private
def thread_local_buffer
thread_id = Thread.current.object_id
@thread_buffers[thread_id] ||= String.new(capacity: 8192)
end
def flush_if_needed(buffer)
if buffer.bytesize >= 8192
flush_buffer(buffer)
buffer.clear
end
end
def flush_buffer(buffer)
@mutex.synchronize do
@destination.write(buffer)
end
end
end
Write amplification occurs when buffering layers cascade, each adding copies. Application buffer to runtime buffer to kernel buffer to device buffer results in three data copies before reaching storage. Zero-copy techniques and direct I/O minimize amplification by reducing intermediate layers.
Measurement methodology affects performance analysis. Micro-benchmarks using small files in page cache show different characteristics than production workloads accessing large datasets on mechanical drives. Benchmark conditions must match production scenarios including file sizes, access patterns, and system load.
Practical Examples
Log aggregation systems demonstrate buffering at scale where individual log messages arrive continuously from multiple sources. Unbuffered writes would create excessive I/O overhead while pure block buffering risks losing recent messages on crashes. A hybrid approach buffers messages with periodic flushes balances performance and durability.
class LogAggregator
BUFFER_SIZE = 64 * 1024
FLUSH_INTERVAL = 5 # seconds
def initialize(log_file)
@buffer = StringIO.new
@log_file = File.open(log_file, 'a')
@mutex = Mutex.new
@last_flush = Time.now
start_background_flush
end
def write_log(level, message)
timestamp = Time.now.strftime('%Y-%m-%d %H:%M:%S')
entry = "[#{timestamp}] #{level}: #{message}\n"
@mutex.synchronize do
@buffer.write(entry)
# Flush on buffer size or critical messages
if @buffer.size >= BUFFER_SIZE || level == 'ERROR' || level == 'FATAL'
flush
end
end
end
def flush
return if @buffer.size.zero?
@log_file.write(@buffer.string)
@log_file.flush
@buffer.reopen("")
@last_flush = Time.now
end
def close
flush
@log_file.close
end
private
def start_background_flush
@flush_thread = Thread.new do
loop do
sleep FLUSH_INTERVAL
@mutex.synchronize { flush }
end
end
end
end
# Usage
logger = LogAggregator.new('application.log')
logger.write_log('INFO', 'Application started')
logger.write_log('DEBUG', 'Processing request')
logger.write_log('ERROR', 'Database connection failed')
logger.close
Network response buffering accumulates HTTP response data before sending, reducing packet fragmentation and TCP overhead. Small responses fit entirely in a single buffer flush while large responses stream through multiple buffer cycles.
class BufferedHTTPResponse
def initialize(socket)
@socket = socket
@buffer = String.new(capacity: 16384)
@headers_sent = false
end
def write_header(status, headers)
@buffer << "HTTP/1.1 #{status}\r\n"
headers.each do |key, value|
@buffer << "#{key}: #{value}\r\n"
end
@buffer << "\r\n"
@headers_sent = true
end
def write_body(data)
raise "Headers not sent" unless @headers_sent
@buffer << data
# Flush when buffer fills or at response end
flush if @buffer.bytesize >= 16384
end
def flush
return if @buffer.empty?
@socket.write(@buffer)
@buffer.clear
end
def close
flush
@socket.close
end
end
# Usage
require 'socket'
server = TCPServer.new(8080)
client = server.accept
response = BufferedHTTPResponse.new(client)
response.write_header('200 OK', {'Content-Type' => 'text/html'})
response.write_body('<html><body>')
response.write_body('<h1>Hello World</h1>')
response.write_body('</body></html>')
response.close
Database batch insertion benefits from buffering by accumulating rows before executing INSERT statements. Individual inserts require separate transactions and index updates while batched inserts amortize overhead across multiple rows.
class BatchInserter
BATCH_SIZE = 1000
def initialize(db_connection, table)
@connection = db_connection
@table = table
@buffer = []
end
def insert(record)
@buffer << record
flush if @buffer.size >= BATCH_SIZE
end
def flush
return if @buffer.empty?
columns = @buffer.first.keys.join(', ')
placeholders = @buffer.map { |_| "(#{Array.new(@buffer.first.size, '?').join(', ')})" }.join(', ')
sql = "INSERT INTO #{@table} (#{columns}) VALUES #{placeholders}"
values = @buffer.flat_map(&:values)
@connection.execute(sql, values)
@buffer.clear
end
def close
flush
end
end
# Usage
inserter = BatchInserter.new(db, 'users')
10_000.times do |i|
inserter.insert({id: i, name: "user#{i}", email: "user#{i}@example.com"})
end
inserter.close
CSV generation for large datasets demonstrates streaming through buffers to avoid loading entire datasets into memory. Each row writes to a buffer that periodically flushes to the output file.
require 'csv'
class BufferedCSVWriter
def initialize(filename)
@file = File.open(filename, 'w')
@buffer = StringIO.new
@csv = CSV.new(@buffer)
end
def write_row(row)
@csv << row
# Flush buffer when it reaches 1MB
if @buffer.size >= 1_048_576
@file.write(@buffer.string)
@buffer.reopen("")
end
end
def close
# Write remaining buffered data
@file.write(@buffer.string) unless @buffer.size.zero?
@file.close
end
end
# Generate large CSV efficiently
writer = BufferedCSVWriter.new('large_dataset.csv')
writer.write_row(['id', 'name', 'value'])
1_000_000.times do |i|
writer.write_row([i, "record_#{i}", rand(1000)])
end
writer.close
Common Pitfalls
Forgetting to flush buffers before program termination loses data silently. Buffered writes remain in memory when the process exits, never reaching storage. Applications must explicitly flush or close streams to ensure data persistence.
# WRONG - data lost on exit
def write_logs_wrong
log = File.open('app.log', 'w')
log.write("Important message")
# Process exits, buffer not flushed
end
# CORRECT - explicit close flushes buffer
def write_logs_correct
File.open('app.log', 'w') do |log|
log.write("Important message")
end # Block ensures close
end
Buffer overflow in fixed-size buffers occurs when accumulating data without bounds checking. The buffer grows beyond allocated capacity, triggering reallocation or crashes. Implementations must either enforce size limits or use dynamic allocation.
# Preventing buffer overflow
class BoundedBuffer
def initialize(max_size, destination)
@buffer = String.new
@max_size = max_size
@destination = destination
end
def write(data)
# Check before adding to prevent overflow
if @buffer.bytesize + data.bytesize > @max_size
flush
end
@buffer << data
end
def flush
@destination.write(@buffer)
@buffer.clear
end
end
Mixing buffered and unbuffered I/O on the same stream creates ordering issues. Data written through different paths may appear out of order in the output. Ruby's IO streams maintain a single buffer but external system calls bypass it entirely.
# WRONG - mixed I/O creates ordering problems
file = File.open('output.txt', 'w')
file.write("buffered")
file.syswrite("unbuffered") # Bypasses buffer
file.write("more buffered")
file.close
# File may contain: "unbufferedmore buffereddata"
Assuming flush guarantees durability leads to data loss during system failures. The flush method transfers data to operating system buffers but does not force writes to physical storage. Power failures or kernel crashes can lose data in OS buffers.
# WRONG - flush alone insufficient for durability
File.open('critical.dat', 'w') do |f|
f.write(important_data)
f.flush # Only guarantees data reaches OS
end
# CORRECT - fsync ensures physical storage
File.open('critical.dat', 'w') do |f|
f.write(important_data)
f.flush
f.fsync # Blocks until data on disk
end
Buffer bloat in network applications occurs when write buffers grow faster than network transmission speed. The buffer accumulates pending data, increasing memory usage and latency. Backpressure mechanisms must limit buffer growth.
class BackpressureBuffer
MAX_BUFFER_SIZE = 1_048_576 # 1MB
def initialize(socket)
@socket = socket
@buffer = String.new
@mutex = Mutex.new
end
def write(data)
@mutex.synchronize do
# Block if buffer exceeds limit
wait_for_space while @buffer.bytesize > MAX_BUFFER_SIZE
@buffer << data
flush_async
end
end
private
def wait_for_space
@mutex.sleep(0.01) # Wait briefly for flush
end
def flush_async
return if @buffer.empty?
data_to_send = @buffer.dup
@buffer.clear
Thread.new do
@socket.write(data_to_send)
end
end
end
Thread safety violations occur when multiple threads access shared buffers without synchronization. Concurrent writes can corrupt buffer state, resulting in garbled output or crashes.
# WRONG - unsynchronized access
class UnsafeBuffer
def initialize
@buffer = String.new
end
def write(data)
@buffer << data # Not thread-safe
end
end
# CORRECT - synchronized access
class SafeBuffer
def initialize
@buffer = String.new
@mutex = Mutex.new
end
def write(data)
@mutex.synchronize do
@buffer << data
end
end
end
Neglecting error handling during flush operations masks I/O failures. Disk full, permission errors, or network failures during flush lose data without notification unless the application checks return values or catches exceptions.
# WRONG - ignoring flush errors
def write_with_no_error_handling
File.open('output.txt', 'w') do |f|
f.write(data)
f.flush # Errors ignored
end
end
# CORRECT - handling flush errors
def write_with_error_handling
begin
File.open('output.txt', 'w') do |f|
f.write(data)
f.flush
end
rescue IOError, SystemCallError => e
Logger.error("Failed to flush data: #{e.message}")
# Implement retry or alternative handling
end
end
Reference
Buffering Strategy Comparison
| Strategy | Flush Trigger | Latency | Throughput | Use Case |
|---|---|---|---|---|
| Unbuffered | Every write | Minimum | Low | Real-time output, error streams |
| Line-buffered | Newline character | Low | Medium | Interactive terminals, log output |
| Block-buffered | Buffer full | Higher | Maximum | File I/O, bulk data processing |
| Time-based | Fixed interval | Bounded | Medium-High | Streaming, continuous data |
| Adaptive | Dynamic conditions | Variable | High | Variable workloads, mixed patterns |
Ruby IO Buffer Control
| Method | Purpose | Behavior |
|---|---|---|
| sync= | Enable/disable buffering | true disables buffer, false enables |
| sync | Query buffer state | Returns current sync mode |
| flush | Force buffer write | Transfers to OS buffer |
| fsync | Synchronize to storage | Blocks until physical write |
| rewind | Reset position | Returns to start without flushing |
| close | Flush and close | Guarantees final flush |
| syswrite | Unbuffered write | Bypasses internal buffer |
| sysread | Unbuffered read | Bypasses internal buffer |
Buffer Size Guidelines
| I/O Type | Typical Size | Consideration |
|---|---|---|
| Small writes | 4-8 KB | Balance overhead vs memory |
| File I/O | 64-256 KB | Match filesystem block size |
| Network I/O | 16-64 KB | Consider MTU and window size |
| Database bulk | 1-10 MB | Balance transaction size |
| Memory buffer | 256 KB | Fit in L2 cache |
| Log aggregation | 64-128 KB | Balance flush frequency |
Performance Characteristics
| Operation | System Call Overhead | Data Transfer | Total Time |
|---|---|---|---|
| Single byte unbuffered | 1-10 μs | 50 ns | ~1-10 μs |
| 4KB buffered write | 1-10 μs | 5 μs | ~6-15 μs |
| 64KB buffered write | 1-10 μs | 80 μs | ~81-90 μs |
| 1MB bulk write | 1-10 μs | 1.25 ms | ~1.26 ms |
Flush Patterns
| Pattern | Implementation | Trade-off |
|---|---|---|
| Eager flush | Flush after each write | Maximum durability, minimum throughput |
| Lazy flush | Flush on close or buffer full | Maximum throughput, data loss risk |
| Threshold flush | Flush at size limit | Balanced approach |
| Periodic flush | Flush on timer | Bounded latency |
| Conditional flush | Flush on important data | Selective durability |
| Hybrid flush | Combine multiple triggers | Optimizes multiple goals |
Error Scenarios
| Error | Cause | Detection | Recovery |
|---|---|---|---|
| Buffer overflow | Insufficient capacity | Size check or exception | Flush and resize |
| Flush failure | Disk full or permission | IOError exception | Retry or alternate storage |
| Partial write | Interrupted system call | Short write count | Resume from offset |
| Sync failure | Storage device error | SystemCallError | Mark data as lost |
| Corruption | Concurrent access | Garbled output | Add synchronization |
| Memory exhaustion | Unbounded growth | Allocation failure | Implement backpressure |
Thread Safety Patterns
| Approach | Description | Overhead |
|---|---|---|
| Global lock | Single mutex for buffer | High contention |
| Per-thread buffers | Separate buffer per thread | Memory overhead |
| Lock-free queue | Atomic operations | Complex implementation |
| Batch synchronization | Lock during flush only | Balanced approach |
Ruby IO Classes
| Class | Buffering | Use Case |
|---|---|---|
| File | Block buffered | File operations |
| StringIO | Memory backed | String building |
| STDOUT | Line buffered | Terminal output |
| STDERR | Unbuffered | Error messages |
| Socket | Block buffered | Network communication |
| Tempfile | Block buffered | Temporary storage |