Overview
Ruby provides comprehensive file reading and writing capabilities through the File
class and IO
class hierarchy. The File
class inherits from IO
and adds file system-specific functionality including path manipulation, file metadata access, and directory operations. Ruby's file operations support various access modes, encoding specifications, and both blocking and non-blocking I/O patterns.
The core file operations revolve around opening file handles, reading or writing data, and properly closing resources. Ruby automatically handles many low-level details like buffer management and system calls while providing granular control when needed.
# Basic file reading
content = File.read('data.txt')
# => "file contents as string"
# Basic file writing
File.write('output.txt', 'new content')
# => 11 (bytes written)
# Working with file handles
File.open('config.json', 'r') do |file|
JSON.parse(file.read)
end
File operations in Ruby support multiple encoding formats, automatic encoding detection, and encoding conversion during I/O operations. The default encoding depends on the system locale, but can be explicitly specified for both reading and writing operations.
Ruby's file I/O integrates with the broader I/O class hierarchy, meaning file objects respond to standard I/O methods like #read
, #write
, #gets
, and #puts
. This consistency allows file objects to be used interchangeably with other I/O objects in many contexts.
Basic Usage
File reading operations offer multiple approaches depending on data size and processing requirements. The File.read
class method loads entire file contents into memory as a string, while File.open
with a block provides streaming access for larger files.
# Read entire file content
full_content = File.read('large_dataset.csv')
# Read with encoding specification
utf8_content = File.read('international.txt', encoding: 'UTF-8')
# Read specific number of bytes
partial_content = File.read('binary_file.dat', 1024)
# Stream reading with file handle
File.open('server.log', 'r') do |file|
file.each_line do |line|
puts line if line.include?('ERROR')
end
end
File writing operations similarly support both convenience methods and handle-based approaches. The File.write
method creates or overwrites files, while File.open
with write modes provides fine-grained control over file operations.
# Write string to file (creates or overwrites)
File.write('results.txt', analysis_results)
# Append to existing file
File.write('application.log', log_entry, mode: 'a')
# Write with specific encoding
File.write('report.txt', report_data, encoding: 'UTF-8')
# Streaming write operations
File.open('export.csv', 'w') do |file|
CSV.new(file) do |csv|
records.each { |record| csv << record }
end
end
File access modes control read/write permissions and file positioning behavior. Ruby supports standard POSIX file modes with additional Ruby-specific enhancements for encoding and newline handling.
# Read-only access
File.open('readonly.txt', 'r') { |f| f.read }
# Write access (truncates existing content)
File.open('output.txt', 'w') { |f| f.puts 'new content' }
# Append mode (positions at end of file)
File.open('log.txt', 'a') { |f| f.puts Time.now }
# Read-write access without truncation
File.open('database.txt', 'r+') do |file|
file.seek(100) # Move to byte position 100
file.write('updated data')
end
Binary file operations require explicit mode specification to prevent encoding transformations and newline conversions that can corrupt binary data.
# Binary read mode
image_data = File.read('photo.jpg', mode: 'rb')
# Binary write operations
File.open('backup.dat', 'wb') do |file|
file.write(compressed_data)
file.write(checksum_bytes)
end
# Copy binary files
File.open('source.bin', 'rb') do |source|
File.open('destination.bin', 'wb') do |dest|
dest.write(source.read(8192)) until source.eof?
end
end
Error Handling & Debugging
File operations generate specific exception types that require targeted error handling strategies. The most common exceptions include Errno::ENOENT
for missing files, Errno::EACCES
for permission issues, and Errno::ENOSPC
for insufficient disk space.
def safe_file_read(filename)
File.read(filename)
rescue Errno::ENOENT
logger.warn "File not found: #{filename}"
nil
rescue Errno::EACCES
logger.error "Permission denied: #{filename}"
raise SecurityError, "Cannot access #{filename}"
rescue Errno::EIO
logger.error "I/O error reading #{filename}"
retry_with_backoff
rescue SystemCallError => e
logger.error "System error: #{e.message}"
raise
end
Encoding-related errors occur when file content doesn't match the specified or detected encoding. These errors manifest as Encoding::InvalidByteSequenceError
or Encoding::UndefinedConversionError
exceptions.
def robust_file_read(filename)
content = File.read(filename, encoding: 'UTF-8')
rescue Encoding::InvalidByteSequenceError
# Retry with binary mode to preserve original bytes
File.read(filename, mode: 'rb')
rescue Encoding::UndefinedConversionError
# Attempt with different encoding
File.read(filename, encoding: 'ISO-8859-1')
rescue ArgumentError => e
if e.message.include?('invalid byte sequence')
File.read(filename, encoding: 'BINARY')
else
raise
end
end
File handle leaks represent a critical resource management issue in Ruby applications. When file handles aren't properly closed, applications exhaust system file descriptor limits, causing subsequent file operations to fail.
# Problematic: file handle may leak on exception
def unsafe_file_processing(filename)
file = File.open(filename, 'r')
process_data(file.read) # May raise exception
file.close # Never reached if exception occurs
end
# Safe: ensures file closure regardless of exceptions
def safe_file_processing(filename)
File.open(filename, 'r') do |file|
process_data(file.read)
end # Block ensures file.close is called
rescue StandardError => e
logger.error "Processing failed for #{filename}: #{e.message}"
raise
end
Debug file I/O issues by examining file system state, permissions, and encoding mismatches. Ruby provides introspection methods for diagnosing file operation problems.
def debug_file_issues(filename)
unless File.exist?(filename)
puts "File does not exist: #{filename}"
return
end
stat = File.stat(filename)
puts "File size: #{stat.size} bytes"
puts "Permissions: #{sprintf('%o', stat.mode)}"
puts "Owner: #{stat.uid}, Group: #{stat.gid}"
puts "Readable: #{File.readable?(filename)}"
puts "Writable: #{File.writable?(filename)}"
# Check encoding of first few bytes
sample = File.read(filename, 100, mode: 'rb')
puts "First bytes (hex): #{sample.unpack('H*').first[0..20]}"
# Attempt encoding detection
sample.force_encoding('UTF-8')
puts "Valid UTF-8: #{sample.valid_encoding?}"
rescue SystemCallError => e
puts "System error: #{e.message}"
end
Performance & Memory
Large file operations require memory-efficient approaches to prevent application memory exhaustion. Streaming reads process files in chunks rather than loading entire contents into memory, making them suitable for files larger than available RAM.
# Memory-efficient line processing
def process_large_log(filename)
File.open(filename, 'r') do |file|
file.each_line do |line|
process_log_entry(line.chomp)
end
end
end
# Chunked binary file processing
def process_large_binary(filename, chunk_size: 8192)
File.open(filename, 'rb') do |file|
while chunk = file.read(chunk_size)
process_binary_chunk(chunk)
end
end
end
# Memory usage comparison
memory_before = `ps -o rss= -p #{Process.pid}`.to_i
File.read('large_file.txt') # Loads entire file
memory_after = `ps -o rss= -p #{Process.pid}`.to_i
puts "Memory increase: #{memory_after - memory_before} KB"
Buffer size optimization affects I/O performance significantly. Ruby's default buffer sizes work well for typical scenarios, but applications can tune buffer sizes for specific workloads and storage systems.
# Default buffer size demonstration
def benchmark_buffer_sizes(filename)
[1024, 4096, 8192, 16384, 65536].each do |buffer_size|
start_time = Time.now
File.open(filename, 'rb') do |file|
while chunk = file.read(buffer_size)
# Simulate processing
end
end
elapsed = Time.now - start_time
puts "Buffer size #{buffer_size}: #{elapsed.round(3)}s"
end
end
# Optimal buffer size varies by storage type
SSD_BUFFER_SIZE = 16384
SPINNING_DISK_BUFFER_SIZE = 65536
NETWORK_BUFFER_SIZE = 8192
File system cache behavior impacts repeated file access patterns. Ruby applications can leverage cache warming and avoid cache pollution through strategic I/O patterns.
# Cache-friendly sequential access
def cache_efficient_read(filename)
File.open(filename, 'rb') do |file|
buffer = String.new(capacity: 65536)
while file.read(65536, buffer)
process_buffer(buffer)
end
end
end
# Avoid cache pollution with large files
def process_huge_file(filename)
# Use larger buffer sizes to minimize system calls
# but not so large as to evict other cached data
File.open(filename, 'rb', buffer: 1024 * 1024) do |file|
file.each_line(chomp: true) do |line|
process_line(line)
end
end
end
Memory mapping provides high-performance access to large files by mapping file contents directly into process memory space. Ruby doesn't provide built-in memory mapping, but the technique can be simulated for read-heavy workloads.
# Simulate memory-mapped behavior for read-heavy access
class FileMap
def initialize(filename)
@filename = filename
@cache = {}
@file_size = File.size(filename)
@page_size = 4096
end
def read_page(offset)
page_num = offset / @page_size
@cache[page_num] ||= begin
File.open(@filename, 'rb') do |file|
file.seek(page_num * @page_size)
file.read(@page_size)
end
end
end
def read(offset, length)
result = String.new
while length > 0
page_data = read_page(offset)
page_offset = offset % @page_size
available = [@page_size - page_offset, length].min
result << page_data[page_offset, available]
offset += available
length -= available
end
result
end
end
Production Patterns
Log file management requires rotation, compression, and concurrent access handling. Production applications must balance log detail with storage efficiency and processing performance.
class ProductionLogger
def initialize(base_filename, max_size: 100 * 1024 * 1024)
@base_filename = base_filename
@max_size = max_size
@current_file = nil
@mutex = Mutex.new
end
def write(message)
@mutex.synchronize do
ensure_current_file
@current_file.puts "#{Time.now.iso8601} #{message}"
@current_file.flush
rotate_if_needed
end
end
private
def current_filename
"#{@base_filename}.#{Date.today.strftime('%Y%m%d')}"
end
def ensure_current_file
filename = current_filename
if @current_file.nil? || @current_file.path != filename
@current_file&.close
@current_file = File.open(filename, 'a')
end
end
def rotate_if_needed
if File.size(@current_file.path) > @max_size
timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
archive_name = "#{@current_file.path}.#{timestamp}.gz"
@current_file.close
system("gzip -c #{@current_file.path} > #{archive_name}")
File.truncate(@current_file.path, 0)
@current_file = File.open(@current_file.path, 'a')
end
end
end
Configuration file handling in production requires error resilience, validation, and hot reloading capabilities. Applications must handle corrupted files gracefully and provide sensible defaults.
class ConfigManager
def initialize(config_path)
@config_path = config_path
@config_mtime = nil
@cached_config = nil
@reload_mutex = Mutex.new
end
def get(key, default = nil)
config = current_config
config.dig(*key.to_s.split('.')) || default
end
def reload_if_changed
return @cached_config unless config_changed?
@reload_mutex.synchronize do
return @cached_config unless config_changed?
load_config
end
end
private
def current_config
reload_if_changed || {}
end
def config_changed?
return true if @cached_config.nil?
current_mtime = File.mtime(@config_path)
current_mtime > @config_mtime
rescue Errno::ENOENT
false
end
def load_config
@cached_config = YAML.safe_load(File.read(@config_path))
@config_mtime = File.mtime(@config_path)
validate_config(@cached_config)
@cached_config
rescue Psych::SyntaxError => e
logger.error "Invalid YAML in #{@config_path}: #{e.message}"
@cached_config || {}
rescue StandardError => e
logger.error "Config load error: #{e.message}"
@cached_config || {}
end
def validate_config(config)
required_keys = %w[database.host database.port api.timeout]
required_keys.each do |key|
unless config.dig(*key.split('.'))
raise ConfigError, "Missing required config: #{key}"
end
end
end
end
Data export and backup operations require atomic writes, verification, and cleanup procedures to ensure data integrity in production environments.
class DataExporter
def initialize(export_directory)
@export_directory = export_directory
@temp_suffix = '.tmp'
end
def export_dataset(dataset_name, records)
timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
final_path = File.join(@export_directory, "#{dataset_name}_#{timestamp}.jsonl")
temp_path = final_path + @temp_suffix
begin
# Atomic write: write to temp file, then rename
File.open(temp_path, 'w') do |file|
records.each do |record|
file.puts JSON.generate(record)
end
file.fsync # Force write to disk
end
# Verify file integrity
verify_export(temp_path, records.count)
# Atomic rename operation
File.rename(temp_path, final_path)
# Create verification checksum
create_checksum(final_path)
final_path
rescue StandardError => e
cleanup_temp_file(temp_path)
raise ExportError, "Export failed: #{e.message}"
end
end
private
def verify_export(file_path, expected_count)
actual_count = 0
File.open(file_path, 'r') do |file|
file.each_line { actual_count += 1 }
end
unless actual_count == expected_count
raise ExportError, "Record count mismatch: expected #{expected_count}, got #{actual_count}"
end
end
def create_checksum(file_path)
checksum = Digest::SHA256.file(file_path).hexdigest
File.write("#{file_path}.sha256", "#{checksum} #{File.basename(file_path)}\n")
end
def cleanup_temp_file(temp_path)
File.unlink(temp_path) if File.exist?(temp_path)
rescue StandardError
# Log but don't raise - cleanup is best effort
end
end
Common Pitfalls
Encoding mismatches cause data corruption and processing errors when file encoding doesn't match application assumptions. Ruby's encoding handling can be subtle, particularly with files created on different systems.
# Problematic: assumes UTF-8 encoding
def broken_text_processing(filename)
content = File.read(filename) # Uses locale default encoding
content.gsub(/[^\w\s]/, '') # May fail on non-UTF-8 characters
end
# Correct: explicit encoding handling
def robust_text_processing(filename)
# Try UTF-8 first, fall back to binary then convert
content = File.read(filename, encoding: 'UTF-8')
rescue Encoding::InvalidByteSequenceError
# File isn't valid UTF-8, read as binary and attempt conversion
binary_content = File.read(filename, mode: 'rb')
content = binary_content.force_encoding('UTF-8')
unless content.valid_encoding?
# Try common encodings
%w[ISO-8859-1 Windows-1252 UTF-16].each do |encoding|
test_content = binary_content.dup.force_encoding(encoding)
if test_content.valid_encoding?
content = test_content.encode('UTF-8')
break
end
end
end
content.gsub(/[^\w\s]/, '')
end
File locking issues occur when multiple processes attempt simultaneous access to the same file. Ruby provides file locking mechanisms, but they require careful implementation to avoid deadlocks.
# Problematic: no coordination between processes
def unsafe_counter_increment(counter_file)
current = File.read(counter_file).to_i
File.write(counter_file, (current + 1).to_s)
end
# Correct: file locking prevents race conditions
def safe_counter_increment(counter_file)
File.open(counter_file, File::RDWR | File::CREAT, 0644) do |file|
file.flock(File::LOCK_EX) # Exclusive lock
current = file.read.to_i
file.rewind
file.truncate(0)
file.write((current + 1).to_s)
file.flush
# Lock automatically released when file closes
end
rescue Errno::ENOLCK
# File locking not supported on this filesystem
fallback_atomic_increment(counter_file)
end
def fallback_atomic_increment(counter_file)
temp_file = "#{counter_file}.tmp.#{Process.pid}"
begin
current = File.exist?(counter_file) ? File.read(counter_file).to_i : 0
File.write(temp_file, (current + 1).to_s)
File.rename(temp_file, counter_file) # Atomic on POSIX systems
ensure
File.unlink(temp_file) if File.exist?(temp_file)
end
end
Binary file corruption occurs when text mode operations are applied to binary data. Ruby's automatic newline conversion and encoding processing can corrupt binary files.
# Problematic: text mode corrupts binary data
def broken_file_copy(source, destination)
content = File.read(source) # Text mode may corrupt binary
File.write(destination, content) # Corruption persists to destination
end
# Correct: explicit binary mode preserves data integrity
def correct_file_copy(source, destination)
File.open(source, 'rb') do |input|
File.open(destination, 'wb') do |output|
while chunk = input.read(8192)
output.write(chunk)
end
end
end
end
# Verify binary file integrity
def verify_file_copy(original, copy)
original_digest = Digest::SHA256.file(original).hexdigest
copy_digest = Digest::SHA256.file(copy).hexdigest
unless original_digest == copy_digest
raise DataCorruption, "File copy verification failed"
end
end
Temporary file cleanup represents a common resource leak when exception handling doesn't account for cleanup requirements. Ruby provides automatic temporary file management through the Tempfile
class.
# Problematic: temp files may not be cleaned up
def unsafe_temp_processing(data)
temp_name = "/tmp/process_#{Process.pid}_#{rand(10000)}"
File.write(temp_name, data)
result = external_processor(temp_name) # May raise exception
File.unlink(temp_name) # Never reached if exception occurs
result
end
# Better: ensure cleanup with begin/ensure
def manual_temp_cleanup(data)
temp_name = "/tmp/process_#{Process.pid}_#{rand(10000)}"
begin
File.write(temp_name, data)
external_processor(temp_name)
ensure
File.unlink(temp_name) if File.exist?(temp_name)
end
end
# Best: use Tempfile for automatic cleanup
def automatic_temp_cleanup(data)
Tempfile.create(['process', '.dat']) do |temp_file|
temp_file.write(data)
temp_file.flush # Ensure data is written
external_processor(temp_file.path)
end # Tempfile automatically cleaned up
end
Permission and ownership issues cause file operations to fail in deployment environments where application user differs from file owner. Applications must handle permission errors gracefully.
def handle_permission_issues(filename, content)
File.write(filename, content)
rescue Errno::EACCES
# Try to fix permissions if we own the file
if File.owned?(filename)
File.chmod(0644, filename)
retry
else
# Create alternative location
fallback_path = File.join(ENV['HOME'], File.basename(filename))
File.write(fallback_path, content)
logger.warn "Wrote to fallback location: #{fallback_path}"
end
rescue Errno::EROFS
# Read-only filesystem - find writable location
writable_path = find_writable_directory
target = File.join(writable_path, File.basename(filename))
File.write(target, content)
logger.warn "Filesystem read-only, wrote to: #{target}"
end
Reference
File Class Methods
Method | Parameters | Returns | Description |
---|---|---|---|
File.read(name, **opts) |
name (String), options (Hash) |
String |
Read entire file content into string |
File.write(name, data, **opts) |
name (String), data (String), options (Hash) |
Integer |
Write data to file, return bytes written |
File.open(name, mode, **opts) |
name (String), mode (String), options (Hash) |
File or block result |
Open file handle or yield to block |
File.exist?(path) |
path (String) |
Boolean |
Check if file exists |
File.size(path) |
path (String) |
Integer |
Return file size in bytes |
File.stat(path) |
path (String) |
File::Stat |
Return file metadata object |
File.chmod(mode, path) |
mode (Integer), path (String) |
Integer |
Change file permissions |
File.rename(old, new) |
old (String), new (String) |
0 |
Rename file atomically |
File Instance Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#read(length=nil, buffer=nil) |
length (Integer), buffer (String) |
String or nil |
Read specified bytes from file |
#write(string) |
string (String) |
Integer |
Write string to file, return bytes written |
#gets(separator=$/, **opts) |
separator (String), options (Hash) |
String or nil |
Read next line from file |
#puts(*objects) |
objects (Array) |
nil |
Write objects as lines to file |
#seek(offset, whence=IO::SEEK_SET) |
offset (Integer), whence (Integer) |
0 |
Move file position pointer |
#tell |
None | Integer |
Return current file position |
#eof? |
None | Boolean |
Check if at end of file |
#flush |
None | File |
Flush write buffer to system |
#fsync |
None | 0 |
Force write to physical storage |
#close |
None | nil |
Close file handle |
File Access Modes
Mode | Description | File Pointer | Truncates |
---|---|---|---|
'r' |
Read only | Beginning | No |
'w' |
Write only | Beginning | Yes |
'a' |
Write only | End | No |
'r+' |
Read/write | Beginning | No |
'w+' |
Read/write | Beginning | Yes |
'a+' |
Read/write | End | No |
Binary Mode Modifiers
Modifier | Description |
---|---|
'b' |
Binary mode (no encoding conversion) |
't' |
Text mode (default, with encoding conversion) |
Common Options Hash Keys
Option | Type | Description |
---|---|---|
:encoding |
String or Encoding |
Character encoding for file content |
:mode |
String |
File access mode (alternative to positional parameter) |
:perm |
Integer |
File permissions for newly created files (octal) |
:flags |
Integer |
System-specific file flags |
:external_encoding |
String or Encoding |
Encoding of file content |
:internal_encoding |
String or Encoding |
Encoding for string conversion |
:textmode |
Boolean |
Enable text mode processing |
:binmode |
Boolean |
Enable binary mode processing |
:autoclose |
Boolean |
Automatically close file when garbage collected |
File System Constants
Constant | Value | Description |
---|---|---|
File::RDONLY |
Platform-specific | Read-only access flag |
File::WRONLY |
Platform-specific | Write-only access flag |
File::RDWR |
Platform-specific | Read-write access flag |
File::CREAT |
Platform-specific | Create file if it doesn't exist |
File::EXCL |
Platform-specific | Fail if file already exists |
File::TRUNC |
Platform-specific | Truncate file to zero length |
File::APPEND |
Platform-specific | Open for appending |
File::NONBLOCK |
Platform-specific | Non-blocking I/O mode |
File Lock Constants
Constant | Description |
---|---|
File::LOCK_SH |
Shared lock (multiple readers) |
File::LOCK_EX |
Exclusive lock (single writer) |
File::LOCK_UN |
Unlock file |
File::LOCK_NB |
Non-blocking lock (don't wait) |
Common Exception Types
Exception | Cause |
---|---|
Errno::ENOENT |
File or directory not found |
Errno::EACCES |
Permission denied |
Errno::EISDIR |
Is a directory (when file expected) |
Errno::ENOTDIR |
Not a directory (when directory expected) |
Errno::ENOSPC |
No space left on device |
Errno::EIO |
Input/output error |
Errno::EMFILE |
Too many open files |
Encoding::InvalidByteSequenceError |
Invalid byte sequence for encoding |
Encoding::UndefinedConversionError |
Character cannot be converted to target encoding |