CrackedRuby logo

CrackedRuby

File Operations

File operations in Ruby covering reading, writing, path manipulation, and file system interactions.

Core Built-in Classes File and IO Classes
2.9.1

Overview

Ruby provides comprehensive file operation capabilities through the File class, IO class, and related modules. The File class inherits from IO and adds file-specific functionality for creating, reading, writing, and manipulating files and directories. Ruby's file operations handle various encodings, provide both blocking and non-blocking I/O options, and integrate with the underlying operating system's file system APIs.

The primary classes for file operations include File for file-specific operations, IO for general input/output, Dir for directory operations, and Pathname for object-oriented path manipulation. Ruby also provides FileUtils module for higher-level file system operations like copying, moving, and deleting files and directories.

# Basic file reading
content = File.read('data.txt')

# File writing with automatic closure
File.write('output.txt', 'Hello World')

# Working with file handles
File.open('config.rb', 'r') do |file|
  file.each_line { |line| puts line.upcase }
end

File operations in Ruby default to text mode with UTF-8 encoding on most systems. Binary mode operations require explicit specification. Ruby handles cross-platform path differences automatically through File.join and related methods.

Basic Usage

File reading operations offer several approaches depending on data size and processing requirements. File.read loads entire files into memory, while File.readlines returns an array of lines. For large files, File.foreach provides line-by-line iteration without loading everything into memory.

# Reading entire file
content = File.read('large_dataset.csv')

# Reading line by line for memory efficiency
File.foreach('large_dataset.csv') do |line|
  process_data_line(line.chomp)
end

# Reading with encoding specification
content = File.read('utf8_file.txt', encoding: 'UTF-8')

# Reading binary files
binary_data = File.read('image.jpg', mode: 'rb')

File writing operations support various modes including append, truncate, and create-if-missing. The File.write method provides a simple interface for common write operations, while File.open with a block offers more control and automatic file closure.

# Simple file writing (overwrites existing)
File.write('log.txt', "Process started: #{Time.now}")

# Appending to existing file
File.write('log.txt', "Process completed: #{Time.now}", mode: 'a')

# Writing with explicit encoding
File.write('unicode.txt', '🚀 Unicode content', encoding: 'UTF-8')

# Structured writing with automatic closure
File.open('report.csv', 'w') do |file|
  file.puts 'Name,Age,City'
  users.each { |user| file.puts "#{user.name},#{user.age},#{user.city}" }
end

Directory operations through the Dir class enable directory traversal, creation, and content listing. Path manipulation methods handle cross-platform compatibility and path resolution.

# Directory listing
Dir.entries('.').reject { |f| f.start_with?('.') }

# Creating directories
Dir.mkdir('temp') unless Dir.exist?('temp')

# Path construction
config_path = File.join(Dir.home, '.myapp', 'config.yml')

# File existence and properties
if File.exist?(config_path) && File.readable?(config_path)
  config = YAML.load_file(config_path)
end

Error Handling & Debugging

File operations generate specific exception types that require targeted handling strategies. Errno::ENOENT occurs when files don't exist, Errno::EACCES indicates permission problems, and Errno::ENOSPC signals insufficient disk space. Each error type requires different recovery approaches.

def safe_file_read(filename)
  File.read(filename)
rescue Errno::ENOENT
  logger.warn "File not found: #{filename}"
  nil
rescue Errno::EACCES
  logger.error "Permission denied reading: #{filename}"
  raise SecurityError, "Cannot access #{filename}"
rescue Errno::EISDIR
  logger.error "#{filename} is a directory, not a file"
  raise ArgumentError, "Expected file, got directory"
rescue SystemCallError => e
  logger.error "System error reading #{filename}: #{e.message}"
  raise
end

File handle management requires careful attention to resource cleanup. Unclosed file handles cause resource leaks and can exhaust system limits. Ruby's block-based file operations provide automatic cleanup, but explicit ensure blocks become necessary with manual handle management.

def process_large_file(filename)
  file = nil
  begin
    file = File.open(filename, 'r')
    buffer = String.new(capacity: 8192)
    
    while file.read(8192, buffer)
      yield buffer
    end
  rescue IOError => e
    logger.error "IO error processing #{filename}: #{e.message}"
    raise
  ensure
    file&.close
  end
end

# Debugging file operations with verbose logging
def debug_file_copy(source, destination)
  logger.debug "Copying #{source} to #{destination}"
  logger.debug "Source exists: #{File.exist?(source)}"
  logger.debug "Source size: #{File.size(source)} bytes"
  
  File.open(source, 'rb') do |src|
    File.open(destination, 'wb') do |dest|
      IO.copy_stream(src, dest)
    end
  end
  
  logger.debug "Copy completed. Destination size: #{File.size(destination)} bytes"
rescue StandardError => e
  logger.error "Copy failed: #{e.class} - #{e.message}"
  logger.debug "Backtrace: #{e.backtrace.join('\n')}"
  File.unlink(destination) if File.exist?(destination)
  raise
end

Encoding-related errors occur when file contents don't match expected character encodings. Ruby raises Encoding::InvalidByteSequenceError and Encoding::UndefinedConversionError for encoding mismatches.

def read_with_encoding_fallback(filename)
  File.read(filename, encoding: 'UTF-8')
rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError
  logger.warn "UTF-8 encoding failed for #{filename}, trying binary read"
  content = File.read(filename, mode: 'rb')
  content.force_encoding('UTF-8').scrub('?')
end

Performance & Memory

File reading performance depends significantly on buffer sizes and reading patterns. Large buffer sizes reduce system calls but increase memory usage. Ruby's IO.copy_stream provides optimized copying between file handles with minimal memory allocation.

# Memory-efficient large file processing
def process_huge_file(filename)
  File.open(filename, 'rb') do |file|
    buffer = String.new(capacity: 65536) # 64KB buffer
    
    while file.read(65536, buffer)
      process_chunk(buffer)
      # Buffer reuse prevents repeated allocation
    end
  end
end

# Comparing memory usage patterns
def memory_comparison_demo
  # High memory usage - loads entire file
  start_memory = memory_usage
  content = File.read('large_file.txt')
  high_memory = memory_usage - start_memory
  
  # Low memory usage - streaming approach
  start_memory = memory_usage
  File.foreach('large_file.txt') { |line| process_line(line) }
  low_memory = memory_usage - start_memory
  
  puts "Full read: #{high_memory}MB, Streaming: #{low_memory}MB"
end

Write operations benefit from batching and buffer management. Frequent small writes cause excessive system calls, while extremely large writes can cause memory pressure. Write caching and periodic flushing provide optimal performance characteristics.

class BufferedWriter
  def initialize(filename, buffer_size: 8192)
    @file = File.open(filename, 'w')
    @buffer = String.new
    @buffer_size = buffer_size
  end
  
  def write(data)
    @buffer << data
    flush_if_needed
  end
  
  def flush
    return if @buffer.empty?
    
    @file.write(@buffer)
    @file.flush
    @buffer.clear
  end
  
  def close
    flush
    @file.close
  end
  
  private
  
  def flush_if_needed
    flush if @buffer.bytesize >= @buffer_size
  end
end

# Usage with automatic cleanup
BufferedWriter.new('output.txt') do |writer|
  1_000_000.times { |i| writer.write("Line #{i}\n") }
end

Directory traversal performance varies significantly between recursive approaches. Dir.glob with recursive patterns performs better than manual recursion for simple matching, while Find.find provides more control for complex filtering.

require 'find'
require 'benchmark'

def performance_comparison
  Benchmark.bm(20) do |x|
    x.report('Dir.glob recursive') do
      Dir.glob('**/*.rb').each { |f| File.size(f) }
    end
    
    x.report('Find.find manual') do
      Find.find('.') do |path|
        next unless path.end_with?('.rb')
        File.size(path)
      end
    end
    
    x.report('Enumerator approach') do
      enum = Enumerator.new do |yielder|
        Find.find('.') { |f| yielder << f if f.end_with?('.rb') }
      end
      enum.lazy.each { |f| File.size(f) }
    end
  end
end

Production Patterns

Production file operations require robust error handling, logging, and monitoring capabilities. Applications must handle concurrent access, temporary file cleanup, and graceful degradation when file operations fail.

class ProductionFileManager
  include MonitorMixin
  
  def initialize(base_path:, temp_dir: nil, max_retries: 3)
    super()
    @base_path = Pathname.new(base_path)
    @temp_dir = temp_dir || Dir.tmpdir
    @max_retries = max_retries
    @metrics = FileOperationMetrics.new
  end
  
  def atomic_write(filename, content)
    synchronize do
      temp_file = create_temp_file(filename)
      
      begin
        File.write(temp_file, content)
        File.rename(temp_file, target_path(filename))
        @metrics.record_success(:write, filename)
      rescue StandardError => e
        File.unlink(temp_file) if File.exist?(temp_file)
        @metrics.record_failure(:write, filename, e)
        raise
      end
    end
  end
  
  def read_with_retry(filename)
    retries = 0
    
    begin
      content = File.read(target_path(filename))
      @metrics.record_success(:read, filename)
      content
    rescue Errno::ENOENT => e
      @metrics.record_failure(:read, filename, e)
      raise
    rescue StandardError => e
      retries += 1
      if retries <= @max_retries
        sleep(0.1 * retries) # Exponential backoff
        retry
      end
      
      @metrics.record_failure(:read, filename, e)
      raise
    end
  end
  
  private
  
  def create_temp_file(filename)
    File.join(@temp_dir, "#{filename}.tmp.#{Process.pid}.#{Time.now.to_f}")
  end
  
  def target_path(filename)
    @base_path.join(filename).to_s
  end
end

Configuration file handling in production environments requires validation, backup strategies, and rollback capabilities. Applications should validate configuration syntax before applying changes and maintain backup copies for rollback scenarios.

class ConfigurationManager
  def initialize(config_path)
    @config_path = Pathname.new(config_path)
    @backup_path = @config_path.sub_ext('.backup')
    @lock_file = @config_path.sub_ext('.lock')
  end
  
  def update_config(new_config)
    acquire_lock do
      create_backup
      validate_config(new_config)
      write_config(new_config)
      reload_application_config
    end
  rescue ConfigurationError => e
    restore_backup if backup_exists?
    logger.error "Configuration update failed: #{e.message}"
    raise
  end
  
  private
  
  def acquire_lock
    File.open(@lock_file, File::CREAT | File::EXCL | File::WRONLY) do |lock|
      lock.flock(File::LOCK_EX)
      yield
    end
  ensure
    File.unlink(@lock_file) if File.exist?(@lock_file)
  end
  
  def create_backup
    FileUtils.cp(@config_path, @backup_path) if @config_path.exist?
  end
  
  def validate_config(config)
    YAML.safe_load(config)
  rescue Psych::SyntaxError => e
    raise ConfigurationError, "Invalid YAML syntax: #{e.message}"
  end
end

Log file management requires rotation, compression, and cleanup strategies to prevent disk space exhaustion. Production applications should implement size-based and time-based rotation policies.

class LogRotator
  def initialize(log_path, max_size: 100 * 1024 * 1024, max_files: 10)
    @log_path = Pathname.new(log_path)
    @max_size = max_size
    @max_files = max_files
  end
  
  def rotate_if_needed
    return unless should_rotate?
    
    rotate_existing_logs
    create_new_log
    cleanup_old_logs
  end
  
  private
  
  def should_rotate?
    @log_path.exist? && @log_path.size > @max_size
  end
  
  def rotate_existing_logs
    (@max_files - 1).downto(1) do |i|
      old_log = @log_path.sub_ext(".#{i}")
      new_log = @log_path.sub_ext(".#{i + 1}")
      
      FileUtils.mv(old_log, new_log) if old_log.exist?
    end
    
    FileUtils.mv(@log_path, @log_path.sub_ext('.1'))
  end
end

Common Pitfalls

File encoding issues represent the most frequent source of production bugs in file operations. Ruby's default encoding assumptions don't always match file contents, leading to Encoding::InvalidByteSequenceError exceptions or corrupted data during processing.

# Problematic - assumes UTF-8 encoding
def naive_file_read(filename)
  File.read(filename) # May fail with binary or non-UTF-8 files
end

# Robust approach with encoding detection
def smart_file_read(filename)
  # Try UTF-8 first
  File.read(filename, encoding: 'UTF-8')
rescue Encoding::InvalidByteSequenceError
  # Fallback to binary read with UTF-8 forcing
  content = File.read(filename, mode: 'rb')
  content.force_encoding('UTF-8').scrub('')
rescue Encoding::UndefinedConversionError
  # Try common encodings
  ['ISO-8859-1', 'Windows-1252'].each do |enc|
    begin
      return File.read(filename, encoding: "#{enc}:UTF-8")
    rescue Encoding::InvalidByteSequenceError
      next
    end
  end
  
  # Last resort - binary mode
  File.read(filename, mode: 'rb')
end

Path construction errors cause cross-platform compatibility issues and security vulnerabilities. Using string concatenation for paths breaks on different operating systems and creates directory traversal attack vectors.

# Dangerous path construction
def unsafe_path_building(user_input)
  # Security vulnerability - directory traversal attack
  "/uploads/" + user_input # user_input could be "../../../etc/passwd"
end

def brittle_path_building
  # Cross-platform issues
  "config" + "/" + "database.yml" # Breaks on Windows
end

# Secure and portable path construction
def safe_path_building(user_input, base_dir)
  # Validate input
  raise ArgumentError, "Invalid filename" if user_input.include?('..')
  raise ArgumentError, "Invalid filename" if user_input.include?('/')
  
  # Use File.join for cross-platform compatibility
  File.join(base_dir, user_input)
end

def robust_config_path
  File.join(Dir.home, '.myapp', 'config', 'database.yml')
end

File handle leaks occur when files aren't properly closed, eventually exhausting system resources. This problem manifests as "Too many open files" errors in production systems under load.

# Problematic - file handles may leak
def leaky_file_processing(filenames)
  files = filenames.map { |name| File.open(name, 'r') }
  
  files.each do |file|
    process_file_content(file.read)
    # Missing file.close - handles leaked if exception occurs
  end
end

# Safe approach with automatic cleanup
def safe_file_processing(filenames)
  filenames.each do |filename|
    File.open(filename, 'r') do |file|
      process_file_content(file.read)
      # Block ensures automatic cleanup
    end
  end
end

# Manual resource management with proper cleanup
def manual_file_processing(filenames)
  files = []
  
  begin
    files = filenames.map { |name| File.open(name, 'r') }
    
    files.each do |file|
      process_file_content(file.read)
    end
  ensure
    files.each(&:close)
  end
end

Atomic write operations prevent partial file corruption during write failures. Applications that directly write to target files risk leaving incomplete or corrupted data when interruptions occur.

# Dangerous - non-atomic write
def unsafe_config_update(filename, config_data)
  File.write(filename, config_data) # Risk of partial writes
end

# Safe - atomic write with temporary file
def atomic_config_update(filename, config_data)
  temp_filename = "#{filename}.tmp.#{Process.pid}"
  
  begin
    File.write(temp_filename, config_data)
    File.rename(temp_filename, filename) # Atomic operation
  rescue StandardError
    File.unlink(temp_filename) if File.exist?(temp_filename)
    raise
  end
end

Reference

File Class Methods

Method Parameters Returns Description
File.read(name, *args) name (String), length (Integer), offset (Integer), **opts String Reads entire file or specified portion
File.write(name, string, *args) name (String), string (String), offset (Integer), **opts Integer Writes string to file, returns bytes written
File.open(filename, mode='r', **opts) filename (String), mode (String), **opts File or result of block Opens file with specified mode
File.exist?(filename) filename (String) Boolean Tests file existence
File.size(filename) filename (String) Integer Returns file size in bytes
File.directory?(filename) filename (String) Boolean Tests if path is directory
File.readable?(filename) filename (String) Boolean Tests if file is readable
File.writable?(filename) filename (String) Boolean Tests if file is writable
File.executable?(filename) filename (String) Boolean Tests if file is executable
File.join(*args) *args (String) String Joins path components with separator
File.expand_path(filename, dir=nil) filename (String), dir (String) String Converts relative path to absolute
File.basename(filename, suffix='') filename (String), suffix (String) String Returns last component of filename
File.dirname(filename) filename (String) String Returns directory portion of filename
File.extname(filename) filename (String) String Returns file extension

File Instance Methods

Method Parameters Returns Description
#read(length=nil, buffer=nil) length (Integer), buffer (String) String or nil Reads specified number of bytes
#write(string) string (String) Integer Writes string, returns bytes written
#puts(*args) *args nil Writes objects with newlines
#gets(separator=$/,limit=nil) separator (String), limit (Integer) String or nil Reads line with separator
#each_line(**opts) **opts Enumerator or result of block Iterates over lines
#rewind None 0 Resets file pointer to beginning
#seek(offset, whence=IO::SEEK_SET) offset (Integer), whence (Integer) 0 Moves file pointer
#pos None Integer Returns current file pointer position
#flush None File Flushes buffered data
#close None nil Closes file handle
#closed? None Boolean Tests if file is closed

File Opening Modes

Mode Description Behavior
'r' Read-only File must exist, pointer at beginning
'w' Write-only Truncates existing file or creates new
'a' Write-only append Pointer at end, creates if missing
'r+' Read-write File must exist, pointer at beginning
'w+' Read-write Truncates existing or creates new
'a+' Read-write append Pointer at end, creates if missing
'rb' Binary read Read-only binary mode
'wb' Binary write Write-only binary, truncates/creates
'ab' Binary append Write-only binary append

Common File Options

Option Type Default Description
:encoding String 'UTF-8' Character encoding for text files
:mode String 'r' File opening mode
:external_encoding String System default Encoding for reading from file
:internal_encoding String nil Encoding for string conversion
:textmode Boolean false Text mode processing
:binmode Boolean false Binary mode processing
:autoclose Boolean true Automatic file closure

FileUtils Module Methods

Method Parameters Returns Description
FileUtils.cp(src, dest, **opts) src (String), dest (String) nil Copies file
FileUtils.mv(src, dest, **opts) src (String), dest (String) nil Moves/renames file
FileUtils.rm(list, **opts) list (String or Array) nil Removes files
FileUtils.mkdir_p(list, **opts) list (String or Array) Array Creates directories recursively
FileUtils.chmod(mode, list, **opts) mode (Integer), list (Array) Array Changes permissions
FileUtils.touch(list, **opts) list (Array) Array Updates timestamps or creates empty files

Dir Class Methods

Method Parameters Returns Description
Dir.entries(dirname) dirname (String) Array Returns all entries in directory
Dir.glob(pattern, flags=0) pattern (String), flags (Integer) Array Returns paths matching pattern
Dir.exist?(dirname) dirname (String) Boolean Tests directory existence
Dir.mkdir(dirname, mode=0777) dirname (String), mode (Integer) 0 Creates directory
Dir.rmdir(dirname) dirname (String) 0 Removes empty directory
Dir.pwd None String Returns current working directory
Dir.chdir(path=nil) path (String) 0 or result of block Changes working directory

Exception Hierarchy

StandardError
├── SystemCallError
│   ├── Errno::ENOENT (File not found)
│   ├── Errno::EACCES (Permission denied)
│   ├── Errno::EISDIR (Is a directory)
│   ├── Errno::ENOTDIR (Not a directory)
│   ├── Errno::ENOSPC (No space left)
│   └── Errno::EMFILE (Too many open files)
├── IOError
├── Encoding::InvalidByteSequenceError
└── Encoding::UndefinedConversionError