CrackedRuby logo

CrackedRuby

File Reading and Writing

Comprehensive guide to file input/output operations, file system interaction, and data persistence in Ruby applications.

Core Built-in Classes File and IO Classes
2.9.2

Overview

Ruby provides comprehensive file reading and writing capabilities through the File class and IO class hierarchy. The File class inherits from IO and adds file system-specific functionality including path manipulation, file metadata access, and directory operations. Ruby's file operations support various access modes, encoding specifications, and both blocking and non-blocking I/O patterns.

The core file operations revolve around opening file handles, reading or writing data, and properly closing resources. Ruby automatically handles many low-level details like buffer management and system calls while providing granular control when needed.

# Basic file reading
content = File.read('data.txt')
# => "file contents as string"

# Basic file writing
File.write('output.txt', 'new content')
# => 11 (bytes written)

# Working with file handles
File.open('config.json', 'r') do |file|
  JSON.parse(file.read)
end

File operations in Ruby support multiple encoding formats, automatic encoding detection, and encoding conversion during I/O operations. The default encoding depends on the system locale, but can be explicitly specified for both reading and writing operations.

Ruby's file I/O integrates with the broader I/O class hierarchy, meaning file objects respond to standard I/O methods like #read, #write, #gets, and #puts. This consistency allows file objects to be used interchangeably with other I/O objects in many contexts.

Basic Usage

File reading operations offer multiple approaches depending on data size and processing requirements. The File.read class method loads entire file contents into memory as a string, while File.open with a block provides streaming access for larger files.

# Read entire file content
full_content = File.read('large_dataset.csv')

# Read with encoding specification  
utf8_content = File.read('international.txt', encoding: 'UTF-8')

# Read specific number of bytes
partial_content = File.read('binary_file.dat', 1024)

# Stream reading with file handle
File.open('server.log', 'r') do |file|
  file.each_line do |line|
    puts line if line.include?('ERROR')
  end
end

File writing operations similarly support both convenience methods and handle-based approaches. The File.write method creates or overwrites files, while File.open with write modes provides fine-grained control over file operations.

# Write string to file (creates or overwrites)
File.write('results.txt', analysis_results)

# Append to existing file
File.write('application.log', log_entry, mode: 'a')

# Write with specific encoding
File.write('report.txt', report_data, encoding: 'UTF-8')

# Streaming write operations
File.open('export.csv', 'w') do |file|
  CSV.new(file) do |csv|
    records.each { |record| csv << record }
  end
end

File access modes control read/write permissions and file positioning behavior. Ruby supports standard POSIX file modes with additional Ruby-specific enhancements for encoding and newline handling.

# Read-only access
File.open('readonly.txt', 'r') { |f| f.read }

# Write access (truncates existing content)  
File.open('output.txt', 'w') { |f| f.puts 'new content' }

# Append mode (positions at end of file)
File.open('log.txt', 'a') { |f| f.puts Time.now }

# Read-write access without truncation
File.open('database.txt', 'r+') do |file|
  file.seek(100)  # Move to byte position 100
  file.write('updated data')
end

Binary file operations require explicit mode specification to prevent encoding transformations and newline conversions that can corrupt binary data.

# Binary read mode
image_data = File.read('photo.jpg', mode: 'rb')

# Binary write operations
File.open('backup.dat', 'wb') do |file|
  file.write(compressed_data)
  file.write(checksum_bytes)
end

# Copy binary files
File.open('source.bin', 'rb') do |source|
  File.open('destination.bin', 'wb') do |dest|
    dest.write(source.read(8192)) until source.eof?
  end
end

Error Handling & Debugging

File operations generate specific exception types that require targeted error handling strategies. The most common exceptions include Errno::ENOENT for missing files, Errno::EACCES for permission issues, and Errno::ENOSPC for insufficient disk space.

def safe_file_read(filename)
  File.read(filename)
rescue Errno::ENOENT
  logger.warn "File not found: #{filename}"
  nil
rescue Errno::EACCES
  logger.error "Permission denied: #{filename}"
  raise SecurityError, "Cannot access #{filename}"
rescue Errno::EIO
  logger.error "I/O error reading #{filename}"
  retry_with_backoff
rescue SystemCallError => e
  logger.error "System error: #{e.message}"
  raise
end

Encoding-related errors occur when file content doesn't match the specified or detected encoding. These errors manifest as Encoding::InvalidByteSequenceError or Encoding::UndefinedConversionError exceptions.

def robust_file_read(filename)
  content = File.read(filename, encoding: 'UTF-8')
rescue Encoding::InvalidByteSequenceError
  # Retry with binary mode to preserve original bytes
  File.read(filename, mode: 'rb')
rescue Encoding::UndefinedConversionError  
  # Attempt with different encoding
  File.read(filename, encoding: 'ISO-8859-1')
rescue ArgumentError => e
  if e.message.include?('invalid byte sequence')
    File.read(filename, encoding: 'BINARY')
  else
    raise
  end
end

File handle leaks represent a critical resource management issue in Ruby applications. When file handles aren't properly closed, applications exhaust system file descriptor limits, causing subsequent file operations to fail.

# Problematic: file handle may leak on exception
def unsafe_file_processing(filename)
  file = File.open(filename, 'r')
  process_data(file.read)  # May raise exception
  file.close  # Never reached if exception occurs
end

# Safe: ensures file closure regardless of exceptions  
def safe_file_processing(filename)
  File.open(filename, 'r') do |file|
    process_data(file.read)
  end  # Block ensures file.close is called
rescue StandardError => e
  logger.error "Processing failed for #{filename}: #{e.message}"
  raise
end

Debug file I/O issues by examining file system state, permissions, and encoding mismatches. Ruby provides introspection methods for diagnosing file operation problems.

def debug_file_issues(filename)
  unless File.exist?(filename)
    puts "File does not exist: #{filename}"
    return
  end

  stat = File.stat(filename)
  puts "File size: #{stat.size} bytes"
  puts "Permissions: #{sprintf('%o', stat.mode)}"  
  puts "Owner: #{stat.uid}, Group: #{stat.gid}"
  puts "Readable: #{File.readable?(filename)}"
  puts "Writable: #{File.writable?(filename)}"

  # Check encoding of first few bytes
  sample = File.read(filename, 100, mode: 'rb')
  puts "First bytes (hex): #{sample.unpack('H*').first[0..20]}"
  
  # Attempt encoding detection
  sample.force_encoding('UTF-8')
  puts "Valid UTF-8: #{sample.valid_encoding?}"
rescue SystemCallError => e
  puts "System error: #{e.message}"
end

Performance & Memory

Large file operations require memory-efficient approaches to prevent application memory exhaustion. Streaming reads process files in chunks rather than loading entire contents into memory, making them suitable for files larger than available RAM.

# Memory-efficient line processing
def process_large_log(filename)
  File.open(filename, 'r') do |file|
    file.each_line do |line|
      process_log_entry(line.chomp)
    end
  end
end

# Chunked binary file processing  
def process_large_binary(filename, chunk_size: 8192)
  File.open(filename, 'rb') do |file|
    while chunk = file.read(chunk_size)
      process_binary_chunk(chunk)
    end
  end
end

# Memory usage comparison
memory_before = `ps -o rss= -p #{Process.pid}`.to_i
File.read('large_file.txt')  # Loads entire file
memory_after = `ps -o rss= -p #{Process.pid}`.to_i
puts "Memory increase: #{memory_after - memory_before} KB"

Buffer size optimization affects I/O performance significantly. Ruby's default buffer sizes work well for typical scenarios, but applications can tune buffer sizes for specific workloads and storage systems.

# Default buffer size demonstration
def benchmark_buffer_sizes(filename)
  [1024, 4096, 8192, 16384, 65536].each do |buffer_size|
    start_time = Time.now
    
    File.open(filename, 'rb') do |file|
      while chunk = file.read(buffer_size)
        # Simulate processing
      end
    end
    
    elapsed = Time.now - start_time
    puts "Buffer size #{buffer_size}: #{elapsed.round(3)}s"
  end
end

# Optimal buffer size varies by storage type
SSD_BUFFER_SIZE = 16384
SPINNING_DISK_BUFFER_SIZE = 65536
NETWORK_BUFFER_SIZE = 8192

File system cache behavior impacts repeated file access patterns. Ruby applications can leverage cache warming and avoid cache pollution through strategic I/O patterns.

# Cache-friendly sequential access
def cache_efficient_read(filename)
  File.open(filename, 'rb') do |file|
    buffer = String.new(capacity: 65536)
    while file.read(65536, buffer)
      process_buffer(buffer)
    end
  end
end

# Avoid cache pollution with large files
def process_huge_file(filename)
  # Use larger buffer sizes to minimize system calls
  # but not so large as to evict other cached data
  File.open(filename, 'rb', buffer: 1024 * 1024) do |file|
    file.each_line(chomp: true) do |line|
      process_line(line)
    end
  end
end

Memory mapping provides high-performance access to large files by mapping file contents directly into process memory space. Ruby doesn't provide built-in memory mapping, but the technique can be simulated for read-heavy workloads.

# Simulate memory-mapped behavior for read-heavy access
class FileMap
  def initialize(filename)
    @filename = filename
    @cache = {}
    @file_size = File.size(filename)
    @page_size = 4096
  end
  
  def read_page(offset)
    page_num = offset / @page_size
    @cache[page_num] ||= begin
      File.open(@filename, 'rb') do |file|
        file.seek(page_num * @page_size)
        file.read(@page_size)
      end
    end
  end
  
  def read(offset, length)
    result = String.new
    while length > 0
      page_data = read_page(offset)
      page_offset = offset % @page_size
      available = [@page_size - page_offset, length].min
      result << page_data[page_offset, available]
      offset += available
      length -= available
    end
    result
  end
end

Production Patterns

Log file management requires rotation, compression, and concurrent access handling. Production applications must balance log detail with storage efficiency and processing performance.

class ProductionLogger
  def initialize(base_filename, max_size: 100 * 1024 * 1024)
    @base_filename = base_filename
    @max_size = max_size
    @current_file = nil
    @mutex = Mutex.new
  end

  def write(message)
    @mutex.synchronize do
      ensure_current_file
      @current_file.puts "#{Time.now.iso8601} #{message}"
      @current_file.flush
      rotate_if_needed
    end
  end

  private

  def current_filename
    "#{@base_filename}.#{Date.today.strftime('%Y%m%d')}"
  end

  def ensure_current_file
    filename = current_filename
    if @current_file.nil? || @current_file.path != filename
      @current_file&.close
      @current_file = File.open(filename, 'a')
    end
  end

  def rotate_if_needed
    if File.size(@current_file.path) > @max_size
      timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
      archive_name = "#{@current_file.path}.#{timestamp}.gz"
      
      @current_file.close
      system("gzip -c #{@current_file.path} > #{archive_name}")
      File.truncate(@current_file.path, 0)
      @current_file = File.open(@current_file.path, 'a')
    end
  end
end

Configuration file handling in production requires error resilience, validation, and hot reloading capabilities. Applications must handle corrupted files gracefully and provide sensible defaults.

class ConfigManager
  def initialize(config_path)
    @config_path = config_path
    @config_mtime = nil
    @cached_config = nil
    @reload_mutex = Mutex.new
  end

  def get(key, default = nil)
    config = current_config
    config.dig(*key.to_s.split('.')) || default
  end

  def reload_if_changed
    return @cached_config unless config_changed?
    
    @reload_mutex.synchronize do
      return @cached_config unless config_changed?
      load_config
    end
  end

  private

  def current_config
    reload_if_changed || {}
  end

  def config_changed?
    return true if @cached_config.nil?
    
    current_mtime = File.mtime(@config_path)
    current_mtime > @config_mtime
  rescue Errno::ENOENT
    false
  end

  def load_config
    @cached_config = YAML.safe_load(File.read(@config_path))
    @config_mtime = File.mtime(@config_path)
    validate_config(@cached_config)
    @cached_config
  rescue Psych::SyntaxError => e
    logger.error "Invalid YAML in #{@config_path}: #{e.message}"
    @cached_config || {}
  rescue StandardError => e
    logger.error "Config load error: #{e.message}"  
    @cached_config || {}
  end

  def validate_config(config)
    required_keys = %w[database.host database.port api.timeout]
    required_keys.each do |key|
      unless config.dig(*key.split('.'))
        raise ConfigError, "Missing required config: #{key}"
      end
    end
  end
end

Data export and backup operations require atomic writes, verification, and cleanup procedures to ensure data integrity in production environments.

class DataExporter
  def initialize(export_directory)
    @export_directory = export_directory
    @temp_suffix = '.tmp'
  end

  def export_dataset(dataset_name, records)
    timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
    final_path = File.join(@export_directory, "#{dataset_name}_#{timestamp}.jsonl")
    temp_path = final_path + @temp_suffix

    begin
      # Atomic write: write to temp file, then rename
      File.open(temp_path, 'w') do |file|
        records.each do |record|
          file.puts JSON.generate(record)
        end
        file.fsync  # Force write to disk
      end

      # Verify file integrity
      verify_export(temp_path, records.count)
      
      # Atomic rename operation
      File.rename(temp_path, final_path)
      
      # Create verification checksum
      create_checksum(final_path)
      
      final_path
    rescue StandardError => e
      cleanup_temp_file(temp_path)
      raise ExportError, "Export failed: #{e.message}"
    end
  end

  private

  def verify_export(file_path, expected_count)
    actual_count = 0
    File.open(file_path, 'r') do |file|
      file.each_line { actual_count += 1 }
    end
    
    unless actual_count == expected_count
      raise ExportError, "Record count mismatch: expected #{expected_count}, got #{actual_count}"
    end
  end

  def create_checksum(file_path)
    checksum = Digest::SHA256.file(file_path).hexdigest
    File.write("#{file_path}.sha256", "#{checksum}  #{File.basename(file_path)}\n")
  end

  def cleanup_temp_file(temp_path)
    File.unlink(temp_path) if File.exist?(temp_path)
  rescue StandardError
    # Log but don't raise - cleanup is best effort
  end
end

Common Pitfalls

Encoding mismatches cause data corruption and processing errors when file encoding doesn't match application assumptions. Ruby's encoding handling can be subtle, particularly with files created on different systems.

# Problematic: assumes UTF-8 encoding
def broken_text_processing(filename)
  content = File.read(filename)  # Uses locale default encoding
  content.gsub(/[^\w\s]/, '')    # May fail on non-UTF-8 characters
end

# Correct: explicit encoding handling
def robust_text_processing(filename)
  # Try UTF-8 first, fall back to binary then convert
  content = File.read(filename, encoding: 'UTF-8')
rescue Encoding::InvalidByteSequenceError
  # File isn't valid UTF-8, read as binary and attempt conversion
  binary_content = File.read(filename, mode: 'rb')
  content = binary_content.force_encoding('UTF-8')
  
  unless content.valid_encoding?
    # Try common encodings
    %w[ISO-8859-1 Windows-1252 UTF-16].each do |encoding|
      test_content = binary_content.dup.force_encoding(encoding)
      if test_content.valid_encoding?
        content = test_content.encode('UTF-8')
        break
      end
    end
  end
  
  content.gsub(/[^\w\s]/, '')
end

File locking issues occur when multiple processes attempt simultaneous access to the same file. Ruby provides file locking mechanisms, but they require careful implementation to avoid deadlocks.

# Problematic: no coordination between processes
def unsafe_counter_increment(counter_file)
  current = File.read(counter_file).to_i
  File.write(counter_file, (current + 1).to_s)
end

# Correct: file locking prevents race conditions
def safe_counter_increment(counter_file)
  File.open(counter_file, File::RDWR | File::CREAT, 0644) do |file|
    file.flock(File::LOCK_EX)  # Exclusive lock
    
    current = file.read.to_i
    file.rewind
    file.truncate(0)
    file.write((current + 1).to_s)
    file.flush
    
    # Lock automatically released when file closes
  end
rescue Errno::ENOLCK
  # File locking not supported on this filesystem
  fallback_atomic_increment(counter_file)
end

def fallback_atomic_increment(counter_file)
  temp_file = "#{counter_file}.tmp.#{Process.pid}"
  
  begin
    current = File.exist?(counter_file) ? File.read(counter_file).to_i : 0
    File.write(temp_file, (current + 1).to_s)
    File.rename(temp_file, counter_file)  # Atomic on POSIX systems
  ensure
    File.unlink(temp_file) if File.exist?(temp_file)
  end
end

Binary file corruption occurs when text mode operations are applied to binary data. Ruby's automatic newline conversion and encoding processing can corrupt binary files.

# Problematic: text mode corrupts binary data
def broken_file_copy(source, destination)
  content = File.read(source)        # Text mode may corrupt binary
  File.write(destination, content)   # Corruption persists to destination
end

# Correct: explicit binary mode preserves data integrity
def correct_file_copy(source, destination)
  File.open(source, 'rb') do |input|
    File.open(destination, 'wb') do |output|
      while chunk = input.read(8192)
        output.write(chunk)
      end
    end
  end
end

# Verify binary file integrity
def verify_file_copy(original, copy)
  original_digest = Digest::SHA256.file(original).hexdigest  
  copy_digest = Digest::SHA256.file(copy).hexdigest
  
  unless original_digest == copy_digest
    raise DataCorruption, "File copy verification failed"
  end
end

Temporary file cleanup represents a common resource leak when exception handling doesn't account for cleanup requirements. Ruby provides automatic temporary file management through the Tempfile class.

# Problematic: temp files may not be cleaned up  
def unsafe_temp_processing(data)
  temp_name = "/tmp/process_#{Process.pid}_#{rand(10000)}"
  File.write(temp_name, data)
  
  result = external_processor(temp_name)  # May raise exception
  File.unlink(temp_name)  # Never reached if exception occurs
  result
end

# Better: ensure cleanup with begin/ensure
def manual_temp_cleanup(data)
  temp_name = "/tmp/process_#{Process.pid}_#{rand(10000)}"  
  
  begin
    File.write(temp_name, data)
    external_processor(temp_name)
  ensure
    File.unlink(temp_name) if File.exist?(temp_name)
  end
end

# Best: use Tempfile for automatic cleanup
def automatic_temp_cleanup(data)
  Tempfile.create(['process', '.dat']) do |temp_file|
    temp_file.write(data)
    temp_file.flush  # Ensure data is written
    external_processor(temp_file.path)
  end  # Tempfile automatically cleaned up
end

Permission and ownership issues cause file operations to fail in deployment environments where application user differs from file owner. Applications must handle permission errors gracefully.

def handle_permission_issues(filename, content)
  File.write(filename, content)
rescue Errno::EACCES
  # Try to fix permissions if we own the file
  if File.owned?(filename)
    File.chmod(0644, filename)
    retry
  else
    # Create alternative location
    fallback_path = File.join(ENV['HOME'], File.basename(filename))
    File.write(fallback_path, content)
    logger.warn "Wrote to fallback location: #{fallback_path}"
  end
rescue Errno::EROFS
  # Read-only filesystem - find writable location
  writable_path = find_writable_directory
  target = File.join(writable_path, File.basename(filename))
  File.write(target, content)
  logger.warn "Filesystem read-only, wrote to: #{target}"
end

Reference

File Class Methods

Method Parameters Returns Description
File.read(name, **opts) name (String), options (Hash) String Read entire file content into string
File.write(name, data, **opts) name (String), data (String), options (Hash) Integer Write data to file, return bytes written
File.open(name, mode, **opts) name (String), mode (String), options (Hash) File or block result Open file handle or yield to block
File.exist?(path) path (String) Boolean Check if file exists
File.size(path) path (String) Integer Return file size in bytes
File.stat(path) path (String) File::Stat Return file metadata object
File.chmod(mode, path) mode (Integer), path (String) Integer Change file permissions
File.rename(old, new) old (String), new (String) 0 Rename file atomically

File Instance Methods

Method Parameters Returns Description
#read(length=nil, buffer=nil) length (Integer), buffer (String) String or nil Read specified bytes from file
#write(string) string (String) Integer Write string to file, return bytes written
#gets(separator=$/, **opts) separator (String), options (Hash) String or nil Read next line from file
#puts(*objects) objects (Array) nil Write objects as lines to file
#seek(offset, whence=IO::SEEK_SET) offset (Integer), whence (Integer) 0 Move file position pointer
#tell None Integer Return current file position
#eof? None Boolean Check if at end of file
#flush None File Flush write buffer to system
#fsync None 0 Force write to physical storage
#close None nil Close file handle

File Access Modes

Mode Description File Pointer Truncates
'r' Read only Beginning No
'w' Write only Beginning Yes
'a' Write only End No
'r+' Read/write Beginning No
'w+' Read/write Beginning Yes
'a+' Read/write End No

Binary Mode Modifiers

Modifier Description
'b' Binary mode (no encoding conversion)
't' Text mode (default, with encoding conversion)

Common Options Hash Keys

Option Type Description
:encoding String or Encoding Character encoding for file content
:mode String File access mode (alternative to positional parameter)
:perm Integer File permissions for newly created files (octal)
:flags Integer System-specific file flags
:external_encoding String or Encoding Encoding of file content
:internal_encoding String or Encoding Encoding for string conversion
:textmode Boolean Enable text mode processing
:binmode Boolean Enable binary mode processing
:autoclose Boolean Automatically close file when garbage collected

File System Constants

Constant Value Description
File::RDONLY Platform-specific Read-only access flag
File::WRONLY Platform-specific Write-only access flag
File::RDWR Platform-specific Read-write access flag
File::CREAT Platform-specific Create file if it doesn't exist
File::EXCL Platform-specific Fail if file already exists
File::TRUNC Platform-specific Truncate file to zero length
File::APPEND Platform-specific Open for appending
File::NONBLOCK Platform-specific Non-blocking I/O mode

File Lock Constants

Constant Description
File::LOCK_SH Shared lock (multiple readers)
File::LOCK_EX Exclusive lock (single writer)
File::LOCK_UN Unlock file
File::LOCK_NB Non-blocking lock (don't wait)

Common Exception Types

Exception Cause
Errno::ENOENT File or directory not found
Errno::EACCES Permission denied
Errno::EISDIR Is a directory (when file expected)
Errno::ENOTDIR Not a directory (when directory expected)
Errno::ENOSPC No space left on device
Errno::EIO Input/output error
Errno::EMFILE Too many open files
Encoding::InvalidByteSequenceError Invalid byte sequence for encoding
Encoding::UndefinedConversionError Character cannot be converted to target encoding