CrackedRuby logo

CrackedRuby

IO Class Methods

Ruby IO class methods provide direct file and stream operations without creating IO instances, handling system-level input/output operations, file manipulation, and stream processing.

Core Built-in Classes File and IO Classes
2.9.4

Overview

Ruby's IO class methods offer direct access to file system operations and stream handling without requiring explicit IO object instantiation. These class-level methods interact directly with the operating system's file descriptors, providing efficient mechanisms for reading, writing, and manipulating files and streams.

The IO class methods fall into several categories: file reading operations (IO.read, IO.readlines, IO.foreach), file writing operations (IO.write, IO.binwrite), file testing operations (IO.exist?, IO.size), and advanced operations (IO.copy_stream, IO.pipe, IO.select). These methods handle encoding conversion, buffering strategies, and system-level error reporting.

Ruby implements these operations as direct calls to the underlying operating system APIs, making them suitable for high-performance file processing and system integration tasks. The methods accept various options for encoding specification, offset positioning, and length limiting.

# Direct file reading without IO object creation
content = IO.read('/etc/hosts')
# => "127.0.0.1\tlocalhost\n..."

# Conditional file operations
lines = IO.readlines('data.txt') if IO.exist?('data.txt')
# => ["first line\n", "second line\n"]

# Binary data handling
IO.binwrite('output.dat', "\x00\x01\x02\x03")
# => 4

Basic Usage

IO class methods handle the most common file operations through direct class method calls. The IO.read method provides complete file reading with optional encoding, offset, and length parameters. This method loads entire file contents into memory, making it suitable for smaller files and configuration data.

# Complete file reading
config = IO.read('config.json')
parsed_config = JSON.parse(config)

# Partial file reading with offset and length
header = IO.read('binary_file.dat', 512, 0)  # First 512 bytes
footer = IO.read('binary_file.dat', 256, -256)  # Last 256 bytes

# Encoding-specific reading
utf8_content = IO.read('unicode.txt', encoding: 'UTF-8')

The IO.readlines method splits file content into an array of lines, handling different line ending conventions automatically. This method preserves line terminators unless specified otherwise through options.

# Line-by-line file processing
log_lines = IO.readlines('/var/log/app.log')
error_lines = log_lines.select { |line| line.include?('ERROR') }

# Custom line separator handling
records = IO.readlines('data.csv', chomp: true)  # Remove line endings
paragraphs = IO.readlines('document.txt', "\n\n")  # Paragraph separator

File writing operations use IO.write for text content and IO.binwrite for binary data. These methods create files if they don't exist and truncate existing files by default.

# Text file writing with encoding
IO.write('output.txt', "Hello, World!\n", encoding: 'UTF-8')

# Binary file writing
binary_data = [0xFF, 0xD8, 0xFF, 0xE0].pack('C*')
IO.binwrite('image_header.bin', binary_data)

# Append mode writing
IO.write('log.txt', "New entry\n", mode: 'a')

Stream copying operations handle efficient data transfer between files, network sockets, and other IO objects without loading complete content into memory.

# File-to-file copying
bytes_copied = IO.copy_stream('source.txt', 'destination.txt')
puts "Copied #{bytes_copied} bytes"

# Partial stream copying with offset
IO.copy_stream('large_file.dat', 'extract.dat', 1024, 512)  # Copy 1KB from offset 512

Error Handling & Debugging

IO class methods raise specific exception types that indicate different failure modes. Understanding these exceptions enables proper error recovery and user feedback mechanisms.

File access errors generate Errno::ENOENT for missing files, Errno::EACCES for permission issues, and Errno::EISDIR when attempting file operations on directories. These system-level exceptions include errno codes and descriptive messages.

# Comprehensive file reading with error handling
def safe_read_file(filename)
  begin
    IO.read(filename)
  rescue Errno::ENOENT
    puts "File not found: #{filename}"
    nil
  rescue Errno::EACCES
    puts "Permission denied: #{filename}"
    nil
  rescue Errno::EISDIR
    puts "Cannot read directory as file: #{filename}"
    nil
  rescue SystemCallError => e
    puts "System error reading #{filename}: #{e.message}"
    nil
  end
end

# Usage with fallback behavior
content = safe_read_file('config.txt') || safe_read_file('default_config.txt')

Encoding errors occur when file content doesn't match specified encoding parameters. Ruby raises Encoding::InvalidByteSequenceError for malformed byte sequences and Encoding::UndefinedConversionError for characters that cannot be represented in the target encoding.

# Encoding error handling with fallback strategies
def read_with_encoding_fallback(filename)
  encodings = ['UTF-8', 'ISO-8859-1', 'ASCII-8BIT']
  
  encodings.each do |encoding|
    begin
      return IO.read(filename, encoding: encoding)
    rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError
      next  # Try next encoding
    end
  end
  
  # Last resort: binary mode
  IO.read(filename, encoding: 'ASCII-8BIT')
rescue StandardError => e
  puts "Failed to read #{filename} with any encoding: #{e.message}"
  nil
end

Disk space and file system errors manifest as Errno::ENOSPC for insufficient space and Errno::EROFS for read-only file systems. These conditions require different recovery strategies.

# Write operation with disk space monitoring
def safe_write_file(filename, content)
  begin
    IO.write(filename, content)
  rescue Errno::ENOSPC
    # Check available space and clean temporary files
    available_space = `df #{File.dirname(filename)}`.split("\n")[1].split[3].to_i
    puts "Insufficient disk space. Available: #{available_space}KB"
    false
  rescue Errno::EROFS
    puts "Cannot write to read-only file system"
    false
  rescue SystemCallError => e
    puts "Write failed: #{e.message} (errno: #{e.errno})"
    false
  end
end

# Validation before write operations
def validate_write_operation(filename, content)
  directory = File.dirname(filename)
  
  unless File.directory?(directory)
    puts "Target directory does not exist: #{directory}"
    return false
  end
  
  unless File.writable?(directory)
    puts "No write permission for directory: #{directory}"
    return false
  end
  
  # Estimate required space (content size + metadata overhead)
  required_space = content.bytesize + 4096  # Add filesystem metadata overhead
  available_space = `df #{directory}`.split("\n")[1].split[3].to_i * 1024
  
  if required_space > available_space
    puts "Insufficient space. Required: #{required_space}, Available: #{available_space}"
    return false
  end
  
  true
end

Performance & Memory

IO class methods exhibit different performance characteristics based on file size, system resources, and access patterns. Understanding these patterns enables optimization for specific use cases.

Reading strategies impact memory usage significantly. The IO.read method loads complete file content into memory, while IO.foreach processes files line-by-line with minimal memory overhead. Large file processing requires streaming approaches to avoid memory exhaustion.

# Memory-efficient large file processing
def process_large_file(filename)
  line_count = 0
  error_count = 0
  
  IO.foreach(filename) do |line|
    line_count += 1
    error_count += 1 if line.include?('ERROR')
    
    # Process line immediately, don't accumulate
    if line_count % 10000 == 0
      puts "Processed #{line_count} lines, #{error_count} errors found"
    end
  end
  
  { total_lines: line_count, errors: error_count }
end

# Memory comparison: whole file vs streaming
def compare_memory_usage(filename)
  # High memory usage - loads complete file
  start_memory = `ps -o rss= -p #{$$}`.to_i
  content = IO.read(filename)
  after_read_memory = `ps -o rss= -p #{$$}`.to_i
  
  puts "Memory after IO.read: #{after_read_memory - start_memory}KB increase"
  
  # Low memory usage - streaming processing
  start_memory = after_read_memory
  line_count = 0
  IO.foreach(filename) { |line| line_count += 1 }
  after_foreach_memory = `ps -o rss= -p #{$$}`.to_i
  
  puts "Memory after IO.foreach: #{after_foreach_memory - start_memory}KB increase"
  puts "Lines processed: #{line_count}"
end

Binary operations typically outperform text operations due to reduced encoding overhead. The IO.binread and IO.binwrite methods bypass encoding conversion, providing maximum throughput for binary data.

# Performance comparison: text vs binary operations
require 'benchmark'

def performance_comparison(filename, data)
  Benchmark.bm(15) do |x|
    x.report("Text write:") { IO.write(filename, data) }
    x.report("Binary write:") { IO.binwrite(filename, data) }
    
    x.report("Text read:") { IO.read(filename) }
    x.report("Binary read:") { IO.binread(filename) }
  end
end

# Stream copying performance optimizations
def optimized_copy_stream(source, destination, buffer_size = 64 * 1024)
  File.open(source, 'rb') do |src|
    File.open(destination, 'wb') do |dst|
      # Use larger buffer for better performance
      IO.copy_stream(src, dst)  # Ruby optimizes internally
    end
  end
end

# Batch file operations for better performance
def batch_file_operations(file_list)
  results = {}
  
  # Batch existence checks
  existing_files = file_list.select { |f| IO.exist?(f) }
  
  # Batch size calculations
  file_sizes = existing_files.each_with_object({}) do |filename, sizes|
    sizes[filename] = IO.size(filename) if IO.exist?(filename)
  end
  
  # Process files sorted by size for memory optimization
  existing_files.sort_by { |f| file_sizes[f] }.each do |filename|
    if file_sizes[filename] < 1_000_000  # Files under 1MB
      results[filename] = IO.read(filename)
    else
      # Stream process large files
      results[filename] = process_large_file(filename)
    end
  end
  
  results
end

Buffer size optimization affects I/O performance significantly. Ruby's default buffer sizes work well for most cases, but specific workloads benefit from tuning.

# Buffer size optimization testing
def test_buffer_sizes(filename, data)
  buffer_sizes = [4096, 8192, 16384, 32768, 64536]
  
  buffer_sizes.each do |size|
    time = Benchmark.realtime do
      File.open(filename, 'wb', buffer: size) do |f|
        f.write(data)
      end
    end
    puts "Buffer size #{size}: #{time.round(4)}s"
  end
end

Production Patterns

Production environments require robust file handling patterns that account for concurrent access, system failures, and resource constraints. IO class methods integrate with monitoring, logging, and deployment workflows.

Configuration file management uses atomic write operations to prevent corruption during updates. This pattern ensures configuration consistency across application restarts and deployments.

# Atomic configuration updates
class ConfigManager
  def self.update_config(filename, new_config)
    temp_filename = "#{filename}.tmp.#{Process.pid}"
    
    begin
      # Write to temporary file first
      IO.write(temp_filename, new_config.to_json)
      
      # Verify written content
      verification = JSON.parse(IO.read(temp_filename))
      raise "Configuration verification failed" unless verification == new_config
      
      # Atomic rename operation
      File.rename(temp_filename, filename)
      
      puts "Configuration updated successfully"
      true
    rescue StandardError => e
      # Clean up temporary file
      File.unlink(temp_filename) if File.exist?(temp_filename)
      puts "Configuration update failed: #{e.message}"
      false
    end
  end
  
  def self.load_config_with_fallback(primary_config, fallback_config)
    if IO.exist?(primary_config)
      JSON.parse(IO.read(primary_config))
    elsif IO.exist?(fallback_config)
      puts "Using fallback configuration: #{fallback_config}"
      JSON.parse(IO.read(fallback_config))
    else
      raise "No configuration file found"
    end
  end
end

Log file rotation and monitoring patterns handle growing log files without interrupting application operation. These patterns integrate with system log rotation tools and monitoring systems.

# Production log management
class LogManager
  MAX_LOG_SIZE = 100 * 1024 * 1024  # 100MB
  MAX_LOG_FILES = 10
  
  def self.write_log_entry(log_file, entry)
    timestamp = Time.now.strftime('%Y-%m-%d %H:%M:%S')
    log_line = "[#{timestamp}] #{entry}\n"
    
    # Check if rotation is needed
    if IO.exist?(log_file) && IO.size(log_file) > MAX_LOG_SIZE
      rotate_log_file(log_file)
    end
    
    IO.write(log_file, log_line, mode: 'a')
  end
  
  def self.rotate_log_file(log_file)
    # Move existing log files
    (MAX_LOG_FILES - 1).downto(1) do |i|
      old_file = "#{log_file}.#{i}"
      new_file = "#{log_file}.#{i + 1}"
      
      if IO.exist?(old_file)
        File.rename(old_file, new_file)
      end
    end
    
    # Move current log file
    File.rename(log_file, "#{log_file}.1") if IO.exist?(log_file)
  end
  
  def self.analyze_recent_logs(log_file, hours_back = 24)
    return [] unless IO.exist?(log_file)
    
    cutoff_time = Time.now - (hours_back * 3600)
    recent_entries = []
    
    IO.foreach(log_file) do |line|
      if line =~ /^\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]/
        entry_time = Time.strptime($1, '%Y-%m-%d %H:%M:%S')
        recent_entries << line if entry_time >= cutoff_time
      end
    end
    
    recent_entries
  end
end

Data export and backup operations require efficient file handling patterns that minimize resource usage and provide progress feedback for long-running operations.

# Production data export patterns
class DataExporter
  def self.export_to_csv(data_source, output_file, batch_size = 1000)
    total_records = data_source.count
    processed_records = 0
    
    File.open(output_file, 'w') do |file|
      # Write CSV header
      file.write("id,name,created_at,status\n")
      
      data_source.find_in_batches(batch_size: batch_size) do |batch|
        csv_lines = batch.map do |record|
          "#{record.id},\"#{record.name}\",#{record.created_at},#{record.status}"
        end
        
        file.write(csv_lines.join("\n") + "\n")
        processed_records += batch.size
        
        # Progress reporting
        progress = (processed_records.to_f / total_records * 100).round(1)
        puts "Export progress: #{progress}% (#{processed_records}/#{total_records})"
      end
    end
    
    puts "Export completed: #{output_file} (#{IO.size(output_file)} bytes)"
  end
  
  def self.verify_export(original_count, export_file)
    return false unless IO.exist?(export_file)
    
    line_count = 0
    IO.foreach(export_file) { |line| line_count += 1 }
    
    # Subtract header line from count
    exported_count = line_count - 1
    
    if exported_count == original_count
      puts "Export verification successful: #{exported_count} records"
      true
    else
      puts "Export verification failed: expected #{original_count}, found #{exported_count}"
      false
    end
  end
end

Common Pitfalls

IO class methods exhibit behavior that frequently causes issues in production applications. Understanding these pitfalls prevents common errors and performance problems.

File encoding assumptions cause data corruption when files contain different character encodings than expected. Ruby's default encoding behavior may not match file content, leading to garbled text or encoding exceptions.

# Encoding detection and handling
def robust_file_reading(filename)
  # Attempt to detect encoding from BOM or content
  first_bytes = IO.binread(filename, 4)
  
  encoding = case first_bytes
  when /^\xEF\xBB\xBF/  # UTF-8 BOM
    'UTF-8'
  when /^\xFF\xFE/      # UTF-16 LE BOM
    'UTF-16LE'
  when /^\xFE\xFF/      # UTF-16 BE BOM
    'UTF-16BE'
  else
    # Fallback: try UTF-8, then system encoding
    begin
      IO.read(filename, 100, encoding: 'UTF-8')
      'UTF-8'
    rescue Encoding::InvalidByteSequenceError
      Encoding.default_external.to_s
    end
  end
  
  IO.read(filename, encoding: encoding)
rescue Encoding::InvalidByteSequenceError => e
  puts "Encoding error in #{filename}: #{e.message}"
  # Last resort: read as binary and force encoding
  IO.read(filename, encoding: 'ASCII-8BIT').force_encoding('UTF-8')
end

# Common mistake: assuming UTF-8 everywhere
def demonstrate_encoding_pitfall
  # This fails if file contains non-UTF-8 content
  begin
    content = IO.read('mixed_encoding.txt', encoding: 'UTF-8')
  rescue Encoding::InvalidByteSequenceError
    puts "UTF-8 assumption failed - file contains different encoding"
    # Correct approach: read as binary first, then handle encoding
    binary_content = IO.binread('mixed_encoding.txt')
    content = binary_content.encode('UTF-8', invalid: :replace, undef: :replace)
  end
end

File system assumptions about path separators, case sensitivity, and filename restrictions cause cross-platform compatibility issues. Code that works on Unix systems may fail on Windows and vice versa.

# Cross-platform file operations
def platform_safe_file_ops(base_path, filename)
  # Normalize path separators
  safe_filename = filename.gsub(/[<>:"|?*]/, '_')  # Remove Windows-invalid chars
  full_path = File.join(base_path, safe_filename)
  
  # Handle case sensitivity differences
  if File.exist?(full_path)
    IO.read(full_path)
  else
    # Case-insensitive search on case-sensitive systems
    directory = File.dirname(full_path)
    target_name = File.basename(full_path).downcase
    
    if File.directory?(directory)
      matching_file = Dir.entries(directory).find do |entry|
        entry.downcase == target_name
      end
      
      if matching_file
        actual_path = File.join(directory, matching_file)
        IO.read(actual_path)
      else
        raise Errno::ENOENT, "File not found: #{full_path}"
      end
    end
  end
end

# Handling long paths and special characters
def safe_path_handling(path)
  # Windows path length limitation
  if RUBY_PLATFORM =~ /mswin|mingw|cygwin/
    if path.length > 260
      puts "Warning: Path exceeds Windows MAX_PATH limit"
      return false
    end
  end
  
  # Check for problematic characters
  problematic_chars = /[^\w\s\-\.\/\\]/
  if path.match(problematic_chars)
    puts "Warning: Path contains special characters: #{path}"
  end
  
  true
end

Memory exhaustion occurs when processing large files with methods that load complete content into memory. This mistake is common when upgrading from small test files to production data volumes.

# Demonstrating memory pitfalls
def memory_pitfall_example
  # WRONG: Will fail with large files
  def process_large_log_wrong(filename)
    all_lines = IO.readlines(filename)  # Loads entire file into memory
    error_lines = all_lines.select { |line| line.include?('ERROR') }
    error_lines.each { |line| puts line }
  end
  
  # CORRECT: Streaming approach
  def process_large_log_correct(filename)
    IO.foreach(filename) do |line|
      puts line if line.include?('ERROR')  # Process immediately
    end
  end
  
  # Memory usage comparison
  filename = 'large_log.txt'
  
  puts "Wrong approach (high memory):"
  memory_before = `ps -o rss= -p #{$$}`.to_i
  process_large_log_wrong(filename) if IO.size(filename) < 10_000_000  # Safety check
  memory_after = `ps -o rss= -p #{$$}`.to_i
  puts "Memory used: #{memory_after - memory_before}KB"
  
  puts "Correct approach (low memory):"
  memory_before = memory_after
  process_large_log_correct(filename)
  memory_after = `ps -o rss= -p #{$$}`.to_i
  puts "Memory used: #{memory_after - memory_before}KB"
end

Race conditions occur in concurrent file access scenarios. Multiple processes writing to the same file simultaneously can cause data corruption or loss.

# File locking for concurrent access
def concurrent_safe_append(filename, content)
  File.open(filename, 'a') do |file|
    begin
      file.flock(File::LOCK_EX)  # Exclusive lock
      file.write("#{Time.now}: #{content}\n")
      file.flush  # Ensure immediate write
    ensure
      file.flock(File::LOCK_UN)  # Release lock
    end
  end
end

# Atomic file replacement pattern
def atomic_file_update(filename, new_content)
  temp_file = "#{filename}.tmp.#{Process.pid}.#{Thread.current.object_id}"
  
  begin
    IO.write(temp_file, new_content)
    File.rename(temp_file, filename)  # Atomic operation on same filesystem
  rescue StandardError => e
    File.unlink(temp_file) if File.exist?(temp_file)
    raise e
  end
end

# Directory creation race condition
def safe_directory_creation(directory_path)
  begin
    Dir.mkdir(directory_path) unless Dir.exist?(directory_path)
  rescue Errno::EEXIST
    # Another process created the directory - this is fine
    puts "Directory already exists: #{directory_path}"
  end
end

Reference

Core Reading Methods

Method Parameters Returns Description
IO.read(name, **opts) name (String), options (Hash) String Reads entire file content into memory
IO.binread(name, length=nil, offset=0) name (String), length (Integer), offset (Integer) String Reads binary data without encoding conversion
IO.readlines(name, **opts) name (String), options (Hash) Array<String> Reads file lines into array
IO.foreach(name, **opts) {block} name (String), options (Hash), block Enumerator Iterates through file lines

Core Writing Methods

Method Parameters Returns Description
IO.write(name, string, **opts) name (String), string (String), options (Hash) Integer Writes text content to file
IO.binwrite(name, string, offset=0) name (String), string (String), offset (Integer) Integer Writes binary data to file

File Information Methods

Method Parameters Returns Description
IO.exist?(name) name (String) Boolean Tests file existence
IO.size(name) name (String) Integer Returns file size in bytes
IO.empty?(name) name (String) Boolean Tests if file is empty

Stream Operations

Method Parameters Returns Description
IO.copy_stream(src, dst, copy_length=nil, src_offset=0) src (IO/String), dst (IO/String), copy_length (Integer), src_offset (Integer) Integer Copies data between streams
IO.pipe(**opts) options (Hash) Array<IO> Creates connected read/write pipe
IO.select(read_array, write_array=nil, error_array=nil, timeout=nil) Arrays of IO objects, timeout (Numeric) Array or nil Monitors multiple IO objects

Common Options Hash Keys

Option Type Default Description
:encoding String/Encoding System default Character encoding for text operations
:mode String 'r' File access mode ('r', 'w', 'a', etc.)
:offset Integer 0 Starting position for read operations
:length Integer nil Maximum bytes to read
:chomp Boolean false Remove line endings from readlines
:binmode Boolean false Use binary mode

Exception Hierarchy

Exception Class Condition Recovery Strategy
Errno::ENOENT File not found Check file path, create file, use fallback
Errno::EACCES Permission denied Check file permissions, run with elevated privileges
Errno::EISDIR Target is directory Use directory-specific operations
Errno::ENOSPC No disk space Clean temporary files, alert administrators
Errno::EROFS Read-only filesystem Use temporary location, alert user
Encoding::InvalidByteSequenceError Invalid encoding Try different encoding, use replacement characters
Encoding::UndefinedConversionError Character conversion failed Use replacement characters, change target encoding

File Mode Strings

Mode Description File Position Truncates Creates
'r' Read only Beginning No No
'w' Write only Beginning Yes Yes
'a' Write only End No Yes
'r+' Read/Write Beginning No No
'w+' Read/Write Beginning Yes Yes
'a+' Read/Write End No Yes

Encoding Names Reference

Encoding Aliases Description
'UTF-8' 'utf8' Unicode 8-bit encoding
'ASCII-8BIT' 'binary' Binary data, no encoding
'ISO-8859-1' 'latin1' Western European encoding
'UTF-16' 'utf16' Unicode 16-bit encoding
'UTF-32' 'utf32' Unicode 32-bit encoding
'Shift_JIS' 'sjis' Japanese encoding

Performance Characteristics

Operation Memory Usage CPU Usage Disk I/O Best For
IO.read High Low Single Small files
IO.foreach Low Medium Streaming Large files
IO.readlines High Low Single Line processing
IO.binread High Low Single Binary data
IO.copy_stream Low Low Streaming File copying