Overview
Ruby provides comprehensive file operation capabilities through the File
class, IO
class, and related modules. The File
class inherits from IO
and adds file-specific functionality for creating, reading, writing, and manipulating files and directories. Ruby's file operations handle various encodings, provide both blocking and non-blocking I/O options, and integrate with the underlying operating system's file system APIs.
The primary classes for file operations include File
for file-specific operations, IO
for general input/output, Dir
for directory operations, and Pathname
for object-oriented path manipulation. Ruby also provides FileUtils
module for higher-level file system operations like copying, moving, and deleting files and directories.
# Basic file reading
content = File.read('data.txt')
# File writing with automatic closure
File.write('output.txt', 'Hello World')
# Working with file handles
File.open('config.rb', 'r') do |file|
file.each_line { |line| puts line.upcase }
end
File operations in Ruby default to text mode with UTF-8 encoding on most systems. Binary mode operations require explicit specification. Ruby handles cross-platform path differences automatically through File.join
and related methods.
Basic Usage
File reading operations offer several approaches depending on data size and processing requirements. File.read
loads entire files into memory, while File.readlines
returns an array of lines. For large files, File.foreach
provides line-by-line iteration without loading everything into memory.
# Reading entire file
content = File.read('large_dataset.csv')
# Reading line by line for memory efficiency
File.foreach('large_dataset.csv') do |line|
process_data_line(line.chomp)
end
# Reading with encoding specification
content = File.read('utf8_file.txt', encoding: 'UTF-8')
# Reading binary files
binary_data = File.read('image.jpg', mode: 'rb')
File writing operations support various modes including append, truncate, and create-if-missing. The File.write
method provides a simple interface for common write operations, while File.open
with a block offers more control and automatic file closure.
# Simple file writing (overwrites existing)
File.write('log.txt', "Process started: #{Time.now}")
# Appending to existing file
File.write('log.txt', "Process completed: #{Time.now}", mode: 'a')
# Writing with explicit encoding
File.write('unicode.txt', '🚀 Unicode content', encoding: 'UTF-8')
# Structured writing with automatic closure
File.open('report.csv', 'w') do |file|
file.puts 'Name,Age,City'
users.each { |user| file.puts "#{user.name},#{user.age},#{user.city}" }
end
Directory operations through the Dir
class enable directory traversal, creation, and content listing. Path manipulation methods handle cross-platform compatibility and path resolution.
# Directory listing
Dir.entries('.').reject { |f| f.start_with?('.') }
# Creating directories
Dir.mkdir('temp') unless Dir.exist?('temp')
# Path construction
config_path = File.join(Dir.home, '.myapp', 'config.yml')
# File existence and properties
if File.exist?(config_path) && File.readable?(config_path)
config = YAML.load_file(config_path)
end
Error Handling & Debugging
File operations generate specific exception types that require targeted handling strategies. Errno::ENOENT
occurs when files don't exist, Errno::EACCES
indicates permission problems, and Errno::ENOSPC
signals insufficient disk space. Each error type requires different recovery approaches.
def safe_file_read(filename)
File.read(filename)
rescue Errno::ENOENT
logger.warn "File not found: #{filename}"
nil
rescue Errno::EACCES
logger.error "Permission denied reading: #{filename}"
raise SecurityError, "Cannot access #{filename}"
rescue Errno::EISDIR
logger.error "#{filename} is a directory, not a file"
raise ArgumentError, "Expected file, got directory"
rescue SystemCallError => e
logger.error "System error reading #{filename}: #{e.message}"
raise
end
File handle management requires careful attention to resource cleanup. Unclosed file handles cause resource leaks and can exhaust system limits. Ruby's block-based file operations provide automatic cleanup, but explicit ensure
blocks become necessary with manual handle management.
def process_large_file(filename)
file = nil
begin
file = File.open(filename, 'r')
buffer = String.new(capacity: 8192)
while file.read(8192, buffer)
yield buffer
end
rescue IOError => e
logger.error "IO error processing #{filename}: #{e.message}"
raise
ensure
file&.close
end
end
# Debugging file operations with verbose logging
def debug_file_copy(source, destination)
logger.debug "Copying #{source} to #{destination}"
logger.debug "Source exists: #{File.exist?(source)}"
logger.debug "Source size: #{File.size(source)} bytes"
File.open(source, 'rb') do |src|
File.open(destination, 'wb') do |dest|
IO.copy_stream(src, dest)
end
end
logger.debug "Copy completed. Destination size: #{File.size(destination)} bytes"
rescue StandardError => e
logger.error "Copy failed: #{e.class} - #{e.message}"
logger.debug "Backtrace: #{e.backtrace.join('\n')}"
File.unlink(destination) if File.exist?(destination)
raise
end
Encoding-related errors occur when file contents don't match expected character encodings. Ruby raises Encoding::InvalidByteSequenceError
and Encoding::UndefinedConversionError
for encoding mismatches.
def read_with_encoding_fallback(filename)
File.read(filename, encoding: 'UTF-8')
rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError
logger.warn "UTF-8 encoding failed for #{filename}, trying binary read"
content = File.read(filename, mode: 'rb')
content.force_encoding('UTF-8').scrub('?')
end
Performance & Memory
File reading performance depends significantly on buffer sizes and reading patterns. Large buffer sizes reduce system calls but increase memory usage. Ruby's IO.copy_stream
provides optimized copying between file handles with minimal memory allocation.
# Memory-efficient large file processing
def process_huge_file(filename)
File.open(filename, 'rb') do |file|
buffer = String.new(capacity: 65536) # 64KB buffer
while file.read(65536, buffer)
process_chunk(buffer)
# Buffer reuse prevents repeated allocation
end
end
end
# Comparing memory usage patterns
def memory_comparison_demo
# High memory usage - loads entire file
start_memory = memory_usage
content = File.read('large_file.txt')
high_memory = memory_usage - start_memory
# Low memory usage - streaming approach
start_memory = memory_usage
File.foreach('large_file.txt') { |line| process_line(line) }
low_memory = memory_usage - start_memory
puts "Full read: #{high_memory}MB, Streaming: #{low_memory}MB"
end
Write operations benefit from batching and buffer management. Frequent small writes cause excessive system calls, while extremely large writes can cause memory pressure. Write caching and periodic flushing provide optimal performance characteristics.
class BufferedWriter
def initialize(filename, buffer_size: 8192)
@file = File.open(filename, 'w')
@buffer = String.new
@buffer_size = buffer_size
end
def write(data)
@buffer << data
flush_if_needed
end
def flush
return if @buffer.empty?
@file.write(@buffer)
@file.flush
@buffer.clear
end
def close
flush
@file.close
end
private
def flush_if_needed
flush if @buffer.bytesize >= @buffer_size
end
end
# Usage with automatic cleanup
BufferedWriter.new('output.txt') do |writer|
1_000_000.times { |i| writer.write("Line #{i}\n") }
end
Directory traversal performance varies significantly between recursive approaches. Dir.glob
with recursive patterns performs better than manual recursion for simple matching, while Find.find
provides more control for complex filtering.
require 'find'
require 'benchmark'
def performance_comparison
Benchmark.bm(20) do |x|
x.report('Dir.glob recursive') do
Dir.glob('**/*.rb').each { |f| File.size(f) }
end
x.report('Find.find manual') do
Find.find('.') do |path|
next unless path.end_with?('.rb')
File.size(path)
end
end
x.report('Enumerator approach') do
enum = Enumerator.new do |yielder|
Find.find('.') { |f| yielder << f if f.end_with?('.rb') }
end
enum.lazy.each { |f| File.size(f) }
end
end
end
Production Patterns
Production file operations require robust error handling, logging, and monitoring capabilities. Applications must handle concurrent access, temporary file cleanup, and graceful degradation when file operations fail.
class ProductionFileManager
include MonitorMixin
def initialize(base_path:, temp_dir: nil, max_retries: 3)
super()
@base_path = Pathname.new(base_path)
@temp_dir = temp_dir || Dir.tmpdir
@max_retries = max_retries
@metrics = FileOperationMetrics.new
end
def atomic_write(filename, content)
synchronize do
temp_file = create_temp_file(filename)
begin
File.write(temp_file, content)
File.rename(temp_file, target_path(filename))
@metrics.record_success(:write, filename)
rescue StandardError => e
File.unlink(temp_file) if File.exist?(temp_file)
@metrics.record_failure(:write, filename, e)
raise
end
end
end
def read_with_retry(filename)
retries = 0
begin
content = File.read(target_path(filename))
@metrics.record_success(:read, filename)
content
rescue Errno::ENOENT => e
@metrics.record_failure(:read, filename, e)
raise
rescue StandardError => e
retries += 1
if retries <= @max_retries
sleep(0.1 * retries) # Exponential backoff
retry
end
@metrics.record_failure(:read, filename, e)
raise
end
end
private
def create_temp_file(filename)
File.join(@temp_dir, "#{filename}.tmp.#{Process.pid}.#{Time.now.to_f}")
end
def target_path(filename)
@base_path.join(filename).to_s
end
end
Configuration file handling in production environments requires validation, backup strategies, and rollback capabilities. Applications should validate configuration syntax before applying changes and maintain backup copies for rollback scenarios.
class ConfigurationManager
def initialize(config_path)
@config_path = Pathname.new(config_path)
@backup_path = @config_path.sub_ext('.backup')
@lock_file = @config_path.sub_ext('.lock')
end
def update_config(new_config)
acquire_lock do
create_backup
validate_config(new_config)
write_config(new_config)
reload_application_config
end
rescue ConfigurationError => e
restore_backup if backup_exists?
logger.error "Configuration update failed: #{e.message}"
raise
end
private
def acquire_lock
File.open(@lock_file, File::CREAT | File::EXCL | File::WRONLY) do |lock|
lock.flock(File::LOCK_EX)
yield
end
ensure
File.unlink(@lock_file) if File.exist?(@lock_file)
end
def create_backup
FileUtils.cp(@config_path, @backup_path) if @config_path.exist?
end
def validate_config(config)
YAML.safe_load(config)
rescue Psych::SyntaxError => e
raise ConfigurationError, "Invalid YAML syntax: #{e.message}"
end
end
Log file management requires rotation, compression, and cleanup strategies to prevent disk space exhaustion. Production applications should implement size-based and time-based rotation policies.
class LogRotator
def initialize(log_path, max_size: 100 * 1024 * 1024, max_files: 10)
@log_path = Pathname.new(log_path)
@max_size = max_size
@max_files = max_files
end
def rotate_if_needed
return unless should_rotate?
rotate_existing_logs
create_new_log
cleanup_old_logs
end
private
def should_rotate?
@log_path.exist? && @log_path.size > @max_size
end
def rotate_existing_logs
(@max_files - 1).downto(1) do |i|
old_log = @log_path.sub_ext(".#{i}")
new_log = @log_path.sub_ext(".#{i + 1}")
FileUtils.mv(old_log, new_log) if old_log.exist?
end
FileUtils.mv(@log_path, @log_path.sub_ext('.1'))
end
end
Common Pitfalls
File encoding issues represent the most frequent source of production bugs in file operations. Ruby's default encoding assumptions don't always match file contents, leading to Encoding::InvalidByteSequenceError
exceptions or corrupted data during processing.
# Problematic - assumes UTF-8 encoding
def naive_file_read(filename)
File.read(filename) # May fail with binary or non-UTF-8 files
end
# Robust approach with encoding detection
def smart_file_read(filename)
# Try UTF-8 first
File.read(filename, encoding: 'UTF-8')
rescue Encoding::InvalidByteSequenceError
# Fallback to binary read with UTF-8 forcing
content = File.read(filename, mode: 'rb')
content.force_encoding('UTF-8').scrub('�')
rescue Encoding::UndefinedConversionError
# Try common encodings
['ISO-8859-1', 'Windows-1252'].each do |enc|
begin
return File.read(filename, encoding: "#{enc}:UTF-8")
rescue Encoding::InvalidByteSequenceError
next
end
end
# Last resort - binary mode
File.read(filename, mode: 'rb')
end
Path construction errors cause cross-platform compatibility issues and security vulnerabilities. Using string concatenation for paths breaks on different operating systems and creates directory traversal attack vectors.
# Dangerous path construction
def unsafe_path_building(user_input)
# Security vulnerability - directory traversal attack
"/uploads/" + user_input # user_input could be "../../../etc/passwd"
end
def brittle_path_building
# Cross-platform issues
"config" + "/" + "database.yml" # Breaks on Windows
end
# Secure and portable path construction
def safe_path_building(user_input, base_dir)
# Validate input
raise ArgumentError, "Invalid filename" if user_input.include?('..')
raise ArgumentError, "Invalid filename" if user_input.include?('/')
# Use File.join for cross-platform compatibility
File.join(base_dir, user_input)
end
def robust_config_path
File.join(Dir.home, '.myapp', 'config', 'database.yml')
end
File handle leaks occur when files aren't properly closed, eventually exhausting system resources. This problem manifests as "Too many open files" errors in production systems under load.
# Problematic - file handles may leak
def leaky_file_processing(filenames)
files = filenames.map { |name| File.open(name, 'r') }
files.each do |file|
process_file_content(file.read)
# Missing file.close - handles leaked if exception occurs
end
end
# Safe approach with automatic cleanup
def safe_file_processing(filenames)
filenames.each do |filename|
File.open(filename, 'r') do |file|
process_file_content(file.read)
# Block ensures automatic cleanup
end
end
end
# Manual resource management with proper cleanup
def manual_file_processing(filenames)
files = []
begin
files = filenames.map { |name| File.open(name, 'r') }
files.each do |file|
process_file_content(file.read)
end
ensure
files.each(&:close)
end
end
Atomic write operations prevent partial file corruption during write failures. Applications that directly write to target files risk leaving incomplete or corrupted data when interruptions occur.
# Dangerous - non-atomic write
def unsafe_config_update(filename, config_data)
File.write(filename, config_data) # Risk of partial writes
end
# Safe - atomic write with temporary file
def atomic_config_update(filename, config_data)
temp_filename = "#{filename}.tmp.#{Process.pid}"
begin
File.write(temp_filename, config_data)
File.rename(temp_filename, filename) # Atomic operation
rescue StandardError
File.unlink(temp_filename) if File.exist?(temp_filename)
raise
end
end
Reference
File Class Methods
Method | Parameters | Returns | Description |
---|---|---|---|
File.read(name, *args) |
name (String), length (Integer), offset (Integer), **opts |
String |
Reads entire file or specified portion |
File.write(name, string, *args) |
name (String), string (String), offset (Integer), **opts |
Integer |
Writes string to file, returns bytes written |
File.open(filename, mode='r', **opts) |
filename (String), mode (String), **opts |
File or result of block |
Opens file with specified mode |
File.exist?(filename) |
filename (String) |
Boolean |
Tests file existence |
File.size(filename) |
filename (String) |
Integer |
Returns file size in bytes |
File.directory?(filename) |
filename (String) |
Boolean |
Tests if path is directory |
File.readable?(filename) |
filename (String) |
Boolean |
Tests if file is readable |
File.writable?(filename) |
filename (String) |
Boolean |
Tests if file is writable |
File.executable?(filename) |
filename (String) |
Boolean |
Tests if file is executable |
File.join(*args) |
*args (String) |
String |
Joins path components with separator |
File.expand_path(filename, dir=nil) |
filename (String), dir (String) |
String |
Converts relative path to absolute |
File.basename(filename, suffix='') |
filename (String), suffix (String) |
String |
Returns last component of filename |
File.dirname(filename) |
filename (String) |
String |
Returns directory portion of filename |
File.extname(filename) |
filename (String) |
String |
Returns file extension |
File Instance Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#read(length=nil, buffer=nil) |
length (Integer), buffer (String) |
String or nil |
Reads specified number of bytes |
#write(string) |
string (String) |
Integer |
Writes string, returns bytes written |
#puts(*args) |
*args |
nil |
Writes objects with newlines |
#gets(separator=$/,limit=nil) |
separator (String), limit (Integer) |
String or nil |
Reads line with separator |
#each_line(**opts) |
**opts | Enumerator or result of block |
Iterates over lines |
#rewind |
None | 0 |
Resets file pointer to beginning |
#seek(offset, whence=IO::SEEK_SET) |
offset (Integer), whence (Integer) |
0 |
Moves file pointer |
#pos |
None | Integer |
Returns current file pointer position |
#flush |
None | File |
Flushes buffered data |
#close |
None | nil |
Closes file handle |
#closed? |
None | Boolean |
Tests if file is closed |
File Opening Modes
Mode | Description | Behavior |
---|---|---|
'r' |
Read-only | File must exist, pointer at beginning |
'w' |
Write-only | Truncates existing file or creates new |
'a' |
Write-only append | Pointer at end, creates if missing |
'r+' |
Read-write | File must exist, pointer at beginning |
'w+' |
Read-write | Truncates existing or creates new |
'a+' |
Read-write append | Pointer at end, creates if missing |
'rb' |
Binary read | Read-only binary mode |
'wb' |
Binary write | Write-only binary, truncates/creates |
'ab' |
Binary append | Write-only binary append |
Common File Options
Option | Type | Default | Description |
---|---|---|---|
:encoding |
String | 'UTF-8' |
Character encoding for text files |
:mode |
String | 'r' |
File opening mode |
:external_encoding |
String | System default | Encoding for reading from file |
:internal_encoding |
String | nil |
Encoding for string conversion |
:textmode |
Boolean | false |
Text mode processing |
:binmode |
Boolean | false |
Binary mode processing |
:autoclose |
Boolean | true |
Automatic file closure |
FileUtils Module Methods
Method | Parameters | Returns | Description |
---|---|---|---|
FileUtils.cp(src, dest, **opts) |
src (String), dest (String) |
nil |
Copies file |
FileUtils.mv(src, dest, **opts) |
src (String), dest (String) |
nil |
Moves/renames file |
FileUtils.rm(list, **opts) |
list (String or Array) |
nil |
Removes files |
FileUtils.mkdir_p(list, **opts) |
list (String or Array) |
Array |
Creates directories recursively |
FileUtils.chmod(mode, list, **opts) |
mode (Integer), list (Array) |
Array |
Changes permissions |
FileUtils.touch(list, **opts) |
list (Array) |
Array |
Updates timestamps or creates empty files |
Dir Class Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Dir.entries(dirname) |
dirname (String) |
Array |
Returns all entries in directory |
Dir.glob(pattern, flags=0) |
pattern (String), flags (Integer) |
Array |
Returns paths matching pattern |
Dir.exist?(dirname) |
dirname (String) |
Boolean |
Tests directory existence |
Dir.mkdir(dirname, mode=0777) |
dirname (String), mode (Integer) |
0 |
Creates directory |
Dir.rmdir(dirname) |
dirname (String) |
0 |
Removes empty directory |
Dir.pwd |
None | String |
Returns current working directory |
Dir.chdir(path=nil) |
path (String) |
0 or result of block |
Changes working directory |
Exception Hierarchy
StandardError
├── SystemCallError
│ ├── Errno::ENOENT (File not found)
│ ├── Errno::EACCES (Permission denied)
│ ├── Errno::EISDIR (Is a directory)
│ ├── Errno::ENOTDIR (Not a directory)
│ ├── Errno::ENOSPC (No space left)
│ └── Errno::EMFILE (Too many open files)
├── IOError
├── Encoding::InvalidByteSequenceError
└── Encoding::UndefinedConversionError