CrackedRuby - File System Architecture

Overview

File system architecture defines how operating systems organize, store, and retrieve data on storage devices. The architecture determines the logical structure that maps physical storage blocks to files and directories that applications can access. Modern file systems balance competing requirements: fast access, efficient space usage, data integrity, concurrent access, and crash recovery.

File system architecture operates at multiple abstraction layers. At the lowest level, physical sectors on disk store raw bytes. The block layer groups sectors into blocks, the fundamental unit of file system operations. Above this, the allocation layer tracks which blocks contain data and which remain free. The directory layer organizes files into a navigable hierarchy. The metadata layer stores information about files: ownership, permissions, timestamps, and physical locations. The top layer presents a consistent API to applications regardless of underlying implementation details.

Application Layer
      ↓
  File API (open, read, write, close)
      ↓
Virtual File System (VFS) Layer
      ↓
Specific File System (ext4, NTFS, HFS+)
      ↓
Block Device Layer
      ↓
Physical Storage (HDD, SSD)

File system design impacts system behavior profoundly. Sequential read performance, random write latency, memory overhead, crash resilience, and maximum file sizes all stem from architectural decisions. A file system optimized for large sequential reads performs differently than one designed for small random writes. Understanding these architectural choices helps developers write code that works efficiently with the underlying storage system.

Key Principles

Hierarchical Organization

Most file systems organize data in tree structures. A root directory contains files and subdirectories, which contain more files and subdirectories. This hierarchy provides namespace organization, access control boundaries, and logical groupings. Each node in the tree represents either a file (leaf node) or directory (internal node). The path from root to any file uniquely identifies that file within the system.

Hierarchical organization separates concerns. Operating system files reside in system directories. User data occupies home directories. Application data lives in program-specific locations. This separation enables different backup policies, security controls, and lifecycle management for different data categories.

Inode Architecture

Unix-derived systems separate file metadata from file data through inodes (index nodes). An inode stores file attributes: permissions, ownership, timestamps, and pointers to data blocks. Directory entries map human-readable names to inode numbers. Multiple directory entries can reference the same inode, creating hard links where different paths access identical file data.

Directory Entry: "document.txt" → Inode 47892
Inode 47892:
  - Size: 8192 bytes
  - Permissions: rw-r--r--
  - Owner: user_id 1000
  - Modified: 2025-10-07 14:23:16
  - Data blocks: [2048, 2049, 2050, 2051]

This indirection provides powerful capabilities. Renaming a file changes only the directory entry mapping, not the inode or data blocks. File permissions attach to inodes, not names, so all hard links share permissions. The inode tracks reference counts; when the last directory entry disappears and no processes hold the file open, the system reclaims the inode and data blocks.

Block Allocation Strategies

File systems must track free and allocated storage blocks. Different strategies optimize for different access patterns. Contiguous allocation places file data in sequential blocks, maximizing sequential read performance but causing fragmentation. Linked allocation chains blocks together through pointers, eliminating external fragmentation but slowing random access. Indexed allocation uses an index block containing pointers to data blocks, combining benefits of both approaches.

Modern file systems often use extent-based allocation. An extent describes a contiguous range of blocks: starting block number and length. Large files need fewer extent entries than individual block pointers, reducing metadata overhead. Extent allocation naturally encourages contiguous layouts, improving sequential performance.

Virtual File System Layer

Operating systems abstract file system specifics behind a virtual file system (VFS) layer. Applications call standard operations: open, read, write, close. The VFS translates these generic operations to file-system-specific implementations. This abstraction allows mounting different file system types simultaneously. An ext4 partition, NTFS volume, network file system, and tmpfs RAM disk all present identical interfaces to applications.

The VFS maintains in-memory structures representing open files, mounted file systems, and cached metadata. File descriptors act as handles to open files, abstracting inode references. The VFS coordinates concurrent access, handles caching, and manages the page cache where file data resides in memory before writing to disk.

Journaling and Consistency

File systems must maintain consistency despite crashes or power failures. Journaling records planned changes before modifying actual file system structures. After a crash, the system replays the journal to complete interrupted operations or roll them back. This prevents inconsistencies like allocated blocks marked as free, or directory entries pointing to deallocated inodes.

Different journaling modes balance performance and durability. Metadata journaling logs only file system structure changes, not file data, providing fast recovery of file system integrity while risking file data corruption. Full journaling logs both metadata and data, guaranteeing consistency at performance cost. Ordered journaling writes data blocks before logging metadata changes, protecting against exposing uninitialized blocks without full data journaling overhead.

Permission Models

File systems implement access control through permission models. Unix permission bits define read, write, and execute rights for owner, group, and others. Access control lists (ACLs) extend this model with fine-grained permissions for specific users or groups. Mandatory access control (MAC) systems like SELinux add security labels, enforcing policies beyond user discretion.

Permissions apply at file system boundaries. Directory permissions control listing contents (read), creating entries (write), and accessing entries (execute/search). Execute permission on directories gates traversal; without it, even knowing a file path prevents access. This layered permission model creates natural security boundaries within the hierarchy.

Ruby Implementation

Ruby provides extensive file system interaction through built-in classes and standard library modules. The File class handles individual file operations. Dir manages directories. FileUtils offers high-level file manipulation utilities. Pathname provides object-oriented path handling with method chaining.

File Operations

The File class wraps low-level file system operations. Opening files returns file objects supporting read, write, and seek operations. Ruby manages file descriptor allocation and cleanup automatically when using block forms.

# Basic file reading
content = File.read('/etc/hosts')

# Reading with explicit encoding
content = File.read('data.txt', encoding: 'UTF-8')

# Block form ensures file closure
File.open('output.txt', 'w') do |file|
  file.write("Line 1\n")
  file.write("Line 2\n")
end

# Binary mode for non-text files
File.open('image.png', 'rb') do |file|
  header = file.read(16)
  # Process binary data
end

File modes control access intent and behavior. Read mode ('r') opens existing files. Write mode ('w') truncates existing content. Append mode ('a') positions at file end. Combined modes like 'r+' allow reading and writing. Adding 'b' enables binary mode, disabling text translation on Windows.

File stat operations retrieve metadata without reading file contents. Size, modification time, permissions, and file type information come from inode attributes accessed through stat system calls.

stat = File.stat('/var/log/system.log')

stat.size          # => 1048576
stat.mtime         # => 2025-10-07 14:23:16 -0500
stat.mode.to_s(8)  # => "100644" (regular file, rw-r--r--)
stat.uid           # => 0 (root)
stat.gid           # => 0 (wheel)

# Type checking
stat.file?         # => true
stat.directory?    # => false
stat.symlink?      # => false
stat.readable?     # => true
stat.writable?     # => false

File class methods provide atomic operations for common tasks. File.rename moves or renames files atomically. File.delete removes files. File.chmod modifies permissions. These operations translate directly to system calls, executing atomically from the application perspective.

# Atomic rename (atomic within same filesystem)
File.rename('temp.txt', 'final.txt')

# Permission modification
File.chmod(0644, 'config.yml')  # rw-r--r--
File.chmod(0755, 'script.sh')   # rwxr-xr-x

# Ownership changes (requires appropriate privileges)
File.chown(user_id, group_id, 'important.dat')

# Create hard link
File.link('original.txt', 'linked.txt')

# Create symbolic link
File.symlink('/usr/bin/ruby', '/usr/local/bin/ruby')

Directory Operations

The Dir class manages directory navigation and manipulation. Listing entries returns arrays of filenames. Pattern matching filters results using glob patterns. Directory iteration processes entries efficiently without loading entire listings into memory.

# List directory contents
entries = Dir.entries('/tmp')
# => [".", "..", "file1.txt", "subdir", ...]

# Exclude . and ..
entries = Dir.children('/tmp')

# Glob pattern matching
ruby_files = Dir.glob('/app/**/*.rb')        # Recursive search
configs = Dir.glob('/etc/*.{yml,yaml}')      # Multiple extensions
logs = Dir.glob('/var/log/app-*.log')        # Pattern matching

# Directory iteration
Dir.foreach('/data') do |entry|
  next if entry.start_with?('.')
  puts "Processing #{entry}"
end

Dir.glob accepts flags modifying match behavior. File::FNM_DOTMATCH includes hidden files. File::FNM_CASEFOLD performs case-insensitive matching. Combining flags with bitwise OR customizes search behavior for specific needs.

# Include hidden files in glob
all_configs = Dir.glob('/home/*/.bashrc', File::FNM_DOTMATCH)

# Case-insensitive matching
images = Dir.glob('/photos/*.{jpg,JPG,jpeg,JPEG}', File::FNM_CASEFOLD)

# Current directory tracking
original_dir = Dir.pwd
Dir.chdir('/tmp') do
  # Operations in /tmp
  puts Dir.pwd  # => "/tmp"
end
puts Dir.pwd    # => original_dir (restored after block)

Directory creation handles nested paths through mkdir_p, creating intermediate directories as needed. This mirrors the shell's mkdir -p behavior, succeeding even if the directory already exists.

# Create single directory (fails if parent missing)
Dir.mkdir('new_dir')

# Create nested directories
Dir.mkdir('deeply/nested/structure')  # Fails without parents

# Ruby standard library solution
require 'fileutils'
FileUtils.mkdir_p('deeply/nested/structure')  # Creates all levels

FileUtils Module

FileUtils provides high-level file manipulation matching common shell operations. Copy, move, remove, and compare operations handle files and directory trees. Method names follow Unix command conventions: cp, mv, rm, touch.

require 'fileutils'

# Copy files
FileUtils.cp('source.txt', 'dest.txt')
FileUtils.cp_r('source_dir', 'dest_dir')  # Recursive directory copy

# Move/rename
FileUtils.mv('old_name.txt', 'new_name.txt')
FileUtils.mv(Dir.glob('*.tmp'), '/tmp')   # Move multiple files

# Remove files and directories
FileUtils.rm('unwanted.txt')
FileUtils.rm_rf('cache_dir')              # Recursive force removal

# Create directory trees
FileUtils.mkdir_p('project/src/lib')

# Touch files (update timestamp or create)
FileUtils.touch('timestamp.txt')

FileUtils supports dry-run mode, showing operations without executing them. Verbose mode prints operations as they execute. These modes help debug file operations and verify logic before committing changes.

# Dry run - show what would happen
FileUtils.rm_rf('/important/data', noop: true, verbose: true)
# => rm -rf /important/data (no actual removal)

# Verbose execution
FileUtils.cp_r('backup', 'restore', verbose: true)
# => cp -r backup restore (performs actual copy)

# Preserve attributes during copy
FileUtils.cp('original.txt', 'copy.txt', preserve: true)
# Maintains timestamps, permissions, ownership

Pathname Objects

The Pathname class provides object-oriented path manipulation. Pathname objects represent file paths with methods for traversal, combination, and querying. Method chaining produces readable code compared to string concatenation.

require 'pathname'

path = Pathname.new('/usr/local/bin')

# Path combination
script = path / 'deploy.sh'  # => #<Pathname:/usr/local/bin/deploy.sh>

# Parent directory navigation
parent = script.parent       # => #<Pathname:/usr/local/bin>
grandparent = script.parent.parent  # => #<Pathname:/usr/local>

# Path components
script.basename              # => #<Pathname:deploy.sh>
script.extname               # => ".sh"
script.dirname               # => #<Pathname:/usr/local/bin>

# Existence and type checking
script.exist?
script.file?
script.directory?
script.executable?

Pathname methods provide clean interfaces to File and Dir operations. Reading, writing, listing directories, and globbing all work through Pathname objects, eliminating class name prefixes.

config_dir = Pathname.new('/etc/app')

# Read file
config_content = config_dir.join('config.yml').read

# List directory
config_dir.children.each do |entry|
  puts entry if entry.file?
end

# Find files recursively
config_dir.glob('**/*.conf').each do |conf_file|
  puts "Found config: #{conf_file}"
end

# Relative path calculation
relative = Pathname.new('/usr/local/bin/script.sh')
  .relative_path_from(Pathname.new('/usr/local'))
# => #<Pathname:bin/script.sh>

File System Monitoring

Ruby applications can watch file system changes through various gems. The listen gem provides cross-platform file system notification. Applications register callbacks triggered when files change, enabling automatic reloading, build systems, and synchronization tools.

require 'listen'

listener = Listen.to('app') do |modified, added, removed|
  modified.each { |file| puts "Modified: #{file}" }
  added.each { |file| puts "Added: #{file}" }
  removed.each { |file| puts "Removed: #{file}" }
end

listener.start
sleep  # Keep running

# Or block while listening
listener.start
listener.stop

Implementation Approaches

Hierarchical File Systems

Hierarchical organization structures data in tree form with directories containing files and subdirectories. This approach dominates modern systems because it matches human mental models of organization. Categories nest within categories, creating natural boundaries for grouping related data.

Implementation requires tracking parent-child relationships. Each directory stores references to contained entries. Unix systems store directory entries as name-to-inode mappings. Directory data blocks contain arrays of directory entry structures: inode number, entry length, name length, and name string. Reading a directory returns these entries, which applications can filter and process.

Hard link limitations follow from hierarchical structure. Hard links must reference inodes within the same file system because inode numbers have meaning only within their allocation table. Symbolic links work across file systems by storing target paths as strings, requiring additional lookup when accessed. These trade-offs influence how applications handle file references.

Path resolution walks the hierarchy from root to target. Each component requires reading a directory, finding the next component, and repeating until reaching the final entry. Deep hierarchies cost more than shallow ones. Systems cache directory entries and inodes to reduce this overhead, but deeply nested structures still impact performance.

Extent-Based Allocation

Extent-based systems group contiguous blocks into ranges described by start position and length. Instead of storing individual block pointers, the file system maintains an extent tree or list. A file occupying blocks 1000-1999 needs only one extent entry rather than 1000 block pointers.

This approach reduces metadata overhead for large files. Extent lists fit in fewer disk blocks than equivalent pointer lists. Fewer metadata blocks mean faster file opening and reduced memory consumption for file system caches. Sequential file operations benefit most because consecutive data blocks enable streaming reads without seek overhead.

Fragmentation remains possible but impacts performance differently. A fragmented file requires multiple extents rather than scattered block pointers. Sequential reads experience seeks only between extents, not between individual blocks. Write patterns affect fragmentation. Append-mostly workloads maintain contiguous layouts. Random writes and frequent deletions gradually fragment files.

Extent allocation strategies balance space efficiency with contiguousness. The allocator searches for free extent runs matching or exceeding requested sizes. Finding perfect matches wastes time compared to accepting first-fit extents. Splitting large free extents satisfies requests quickly but creates fragmentation. Delayed allocation batches writes, allowing better placement decisions.

Copy-on-Write Architecture

Copy-on-write (CoW) file systems never modify data in place. Write operations allocate new blocks, update pointers, and mark old blocks free. This design provides inherent crash consistency because updates commit atomically when root pointers change. Incomplete writes leave old data accessible.

Snapshot support follows naturally from CoW. Creating a snapshot freezes current root pointers. Subsequent writes allocate new blocks without affecting snapshot data. Multiple snapshots share unchanged blocks, consuming space only for differences. This enables cheap, frequent backups and versioning.

Initial state:
Root → Block A → Block B → Block C

After write to Block B:
Root → Block A → Block B' → Block C
                  ↓
Snapshot → Block A → Block B → Block C

CoW amplifies random write overhead. Each small write triggers metadata updates cascading up the tree to the root. Writing one data block may require writing multiple metadata blocks. Solid-state drives mitigate this because random writes cost less than on rotating media. Hard disk CoW systems buffer writes or use hybrid approaches.

Fragmentation characteristics differ from traditional file systems. Sequential writes scatter across the device as the allocator finds free space. Over time, even sequentially written files become fragmented. Some CoW systems implement background defragmentation, moving blocks to restore sequential layout without downtime.

Log-Structured File Systems

Log-structured file systems write all data and metadata sequentially in circular log structure. Instead of updating files in place, the system appends changes to the log tail. Periodic cleaning reclaims space by copying live data from old segments.

This organization transforms random writes into sequential writes, maximizing write throughput on both hard disks and SSDs. Write amplification from cleaning trades throughput for space efficiency. Active workloads need aggressive cleaning; write-once workloads barely clean at all.

Log segments:
[Seg 0: File A v1, File B v1]
[Seg 1: File A v2, File C v1]
[Seg 2: File B v2, File A v3]
[Seg 3: Free]

After cleaning Seg 0:
[Seg 0: File B v2, File A v3]  (copied from later segments)
[Seg 1: Partially free]
[Seg 2: Marked for cleaning]

Reading requires indirection through inode maps. The map translates inode numbers to current log positions. Updates append new map entries to the log. Checkpoint operations flush map changes, establishing recovery points. After crash, the system recovers by scanning from the last checkpoint.

Log cleaning compaction chooses segments to clean based on cost-benefit analysis. Segments with little live data (high dead ratio) cost less to clean because less copying occurs. Cold segments containing stable data clean more efficiently than hot segments with frequently updated files. Adaptive cleaning adjusts to workload characteristics.

B-Tree File Systems

B-tree file systems organize all file system structures in balanced trees. Directories, extents, and free space maps all reside in B-trees indexed by appropriate keys. This uniform structure simplifies implementation and enables efficient range queries.

Directory B-trees index entries by filename hash or lexicographic order. Range lookups support glob patterns without scanning entire directories. Large directories scale to millions of entries without performance degradation because tree height grows logarithmically.

Extent B-trees index physical block allocation by logical file offset. Looking up offset 1,000,000 finds the extent containing that offset without scanning all prior extents. This enables efficient sparse file support and random access to large files.

Free space B-trees track available blocks, enabling fast allocation of specific sizes or regions. Range queries find contiguous free extents matching size requirements. Buddy allocation systems can layer over B-tree free space maps for different allocation strategies.

Concurrent access uses B-tree locking protocols. Readers hold read locks on traversed nodes. Writers hold write locks on modified nodes. Latching upper levels as shared during descent and exclusive during modification enables concurrent operations. Modern implementations use optimistic protocols, validating consistency after reading without locks.

Design Considerations

Sequential vs Random Access Patterns

Access pattern prediction influences file system selection. Workloads dominated by sequential reads favor systems optimizing contiguous layout: extent-based allocation, delayed allocation, or log-structured approaches. Sequential writes benefit from batching and write-ahead logging that converts random writes to sequential log appends.

Random access workloads prefer indexed allocation structures. B-tree file systems excel at finding arbitrary file offsets without scanning. Hash-based directories locate entries without walking linear lists. Memory mapping bypasses system call overhead for random access, treating files as address space regions.

Mixed workloads complicate decisions. Database systems perform both sequential scans and random index lookups. Streaming writes feed sequential ingestion pipelines. The file system should not pessimize common operations to optimize uncommon ones. Profiling actual workloads reveals dominant patterns worth optimizing.

Metadata Overhead Trade-offs

Metadata volume impacts memory consumption and access latency. Inode-based systems store fixed-size metadata structures, wasting space for small files but enabling O(1) metadata access. B-tree systems pack metadata efficiently but require tree traversal for each access.

Large numbers of small files amplify metadata costs. A thousand 1KB files consume more space for inodes and directory entries than file data. Some systems employ inline data, storing small file contents directly in inodes or directory entries, eliminating data block allocation overhead.

Extended attributes and access control lists increase metadata size. Applications storing metadata in extended attributes (checksums, labels, provenance) multiply metadata volume. File systems limiting metadata size force applications to store attributes in separate files, complicating management.

Metadata journaling costs differ from data journaling. Metadata changes occur more frequently than data changes in many workloads. Creating, deleting, and renaming files generate metadata writes without touching file data. Separating metadata and data journals allows independent tuning of each.

Durability and Performance Balance

Synchronous writes guarantee durability at performance cost. Each write waits for storage controller confirmation. Applications requiring durability call fsync() explicitly, flushing dirty data to stable storage before proceeding. File systems cannot ignore fsync(); doing so risks data loss despite application guarantees.

Asynchronous writes improve throughput by batching operations. Write combining merges adjacent writes into larger requests. Caching delays writes, hoping to absorb overwrites before committing to storage. This trades latency for throughput; applications tolerate delayed writes as long as data eventually persists.

Barriers order operations without forcing synchronous writes. Write barriers ensure prior operations complete before subsequent operations start, establishing ordering guarantees needed for consistency. Barriers allow more concurrency than full synchronization while preventing problematic reorderings.

Tuning dirty page thresholds controls memory pressure and latency. Low thresholds trigger frequent writeback, reducing memory consumption but increasing write overhead. High thresholds batch more writes but risk long stalls when memory fills. The optimal threshold depends on workload and available memory.

Scalability Dimensions

File count scalability determines maximum files per directory and per file system. Linear directory scans limit large directories to thousands of entries before performance degrades. Hashed or B-tree directories scale to millions. Inode table size limits total file count; running out of inodes renders space unusable.

File size scalability impacts maximum file size and sparse file support. 32-bit block pointers limit files to several terabytes. 64-bit pointers extend this to exabytes theoretically, though practical limits remain lower. Sparse file support allows creating files with large logical sizes consuming space only for written regions.

Concurrent access scalability determines multi-core and multi-client performance. Coarse-grained locks serialize all operations; fine-grained locks enable parallelism. Lock-free algorithms maximize concurrency but complicate correctness. The optimal granularity balances overhead and parallelism.

Storage device count impacts throughput and capacity. File systems striping across multiple devices aggregate bandwidth. RAID configurations provide redundancy. Software RAID in file system layers (like Btrfs RAID) tightly integrates redundancy with file system metadata. Separate RAID layers simplify file system design but lose optimization opportunities.

Performance Considerations

Read-Ahead and Write-Behind

Sequential read patterns trigger read-ahead, prefetching subsequent blocks before applications request them. Effective read-ahead hides latency by keeping the next data block ready when needed. Too much read-ahead wastes memory and bandwidth on data never accessed. Too little read-ahead causes stalls waiting for next blocks.

# Sequential read benefits from read-ahead
File.open('large_log.txt') do |file|
  file.each_line do |line|
    # Process line
    # File system prefetches next blocks during processing
  end
end

# Random access defeats read-ahead
File.open('database.dat', 'rb') do |file|
  [100, 5000, 234, 9999].each do |offset|
    file.seek(offset * 512)
    record = file.read(512)
    # Each seek prevents read-ahead prediction
  end
end

Write-behind delays writes to stable storage, batching dirty pages in memory. Applications continue executing while background threads flush data. The page cache absorbs burst writes, smoothing spiky workloads into steady streams. Crash risk exists; dirty pages in memory disappear on power loss unless applications call fsync().

The pdflush/bdflush mechanisms control write-behind behavior. Dirty page age limits prevent stale data lingering too long. Memory pressure triggers flushing when free memory drops below thresholds. Explicit sync calls force immediate flush, establishing consistency points.

Block Alignment and Sizing

Misaligned I/O reads or writes partial blocks, forcing read-modify-write cycles. Reading 1000 bytes starting at offset 50 requires reading two 512-byte blocks, extracting the relevant range. Writing unaligned data reads existing blocks, modifies affected bytes, writes both blocks back.

# Aligned reads (efficient - one I/O per block)
File.open('data.bin', 'rb') do |file|
  chunk = file.read(4096)  # Reads exactly one 4KB block
end

# Unaligned reads (less efficient - multiple I/Os)
File.open('data.bin', 'rb') do |file|
  file.seek(100)           # Offset not block-aligned
  chunk = file.read(5000)  # Size not block-multiple
  # Reads multiple blocks, extracts relevant bytes
end

Larger block sizes reduce metadata overhead but increase internal fragmentation. A file system with 4KB blocks stores a 1000-byte file in one block, wasting 3KB. The same file in a 512-byte block system wastes only 488 bytes. Large blocks favor large files; small blocks favor small files. Configuring block size at format time requires predicting workload characteristics.

SSD alignment matters more than hard disk alignment. SSDs erase entire pages (often 4KB) and write entire blocks (often 512KB). Unaligned writes spanning erase pages force read-erase-modify-write cycles, amplifying write traffic. Aligning file system structures to SSD block boundaries maximizes write efficiency.

Caching Strategies

The page cache holds recently accessed file data in memory, serving subsequent reads without device I/O. Cache hits eliminate latency and preserve bandwidth for uncached data. Cache management evicts least-recently-used pages when memory fills, keeping hot data cached.

# First read loads from disk
start_time = Time.now
File.read('/var/log/syslog')
first_read_time = Time.now - start_time
# => 0.045 seconds

# Second read serves from cache
start_time = Time.now
File.read('/var/log/syslog')
second_read_time = Time.now - start_time
# => 0.002 seconds (20x faster)

Directory entry caching (dcache) holds name-to-inode mappings. Path resolution queries the dcache before reading directory blocks. Negative caching remembers nonexistent paths, avoiding repeated failed lookups. Cache invalidation on file creation and deletion maintains consistency.

Inode caching holds in-memory inodes for open files and recently accessed files. Open file table entries reference cached inodes, eliminating repeated disk reads. Modified inodes remain cached as dirty until writeback flushes them. The VFS layer coordinates caching across file system types.

Application-level caching adds another layer. Reading configuration files once and caching contents in application memory prevents repeated file system access. Watching files for changes (using inotify or similar) enables cache invalidation when files change. This pattern reduces load but complicates cache coherency.

Memory-Mapped Files

Memory mapping projects file contents into process address space. Accessing mapped regions reads or writes file data through page faults rather than system calls. This eliminates read/write overhead for random access workloads, treating files as memory arrays.

# Standard I/O approach
File.open('data.dat', 'rb') do |file|
  1000.times do
    offset = rand(file.size)
    file.seek(offset)
    byte = file.read(1)
    # System call overhead per read
  end
end

# Memory-mapped approach (requires C extension)
# Conceptual example - actual Ruby implementation varies
mapped = mmap('data.dat')
1000.times do
  offset = rand(mapped.size)
  byte = mapped[offset]
  # Memory access, no system call
end

Write sharing between processes requires careful coordination. Private mappings (MAP_PRIVATE) copy-on-write, isolating modifications. Shared mappings (MAP_SHARED) persist writes to files, visible to other processes. Synchronizing shared mappings requires explicit msync() calls or file locking.

Memory mapping large files consumes address space but not physical memory. The kernel pages in data on access. 32-bit processes hit address space limits around 2-3GB. 64-bit processes virtually unlimited address space, mapping terabyte-scale files without address space concerns.

Optimization Patterns

Buffering I/O operations amortizes system call overhead. Writing single bytes calls write() thousands of times; buffering accumulates bytes and writes blocks reduces call frequency dramatically. Ruby's IO buffering handles this automatically, but applications can tune buffer sizes.

# Unbuffered writes (inefficient)
File.open('output.txt', 'w') do |file|
  10000.times { |i| file.write("#{i}\n"); file.flush }
end

# Buffered writes (efficient)
File.open('output.txt', 'w') do |file|
  10000.times { |i| file.write("#{i}\n") }
  # Automatic flush on close
end

# Custom buffer size
File.open('huge.log', 'w') do |file|
  file.sync = false  # Disable auto-flush
  buffer = []
  huge_data.each do |record|
    buffer << format_record(record)
    if buffer.size >= 1000
      file.write(buffer.join)
      buffer.clear
    end
  end
  file.write(buffer.join) if buffer.any?
end

Batch operations reduce metadata update overhead. Creating 1000 files in individual operations forces 1000 metadata syncs. Grouping operations and syncing once reduces writes. The fsync() call on the parent directory commits directory modifications together.

Parallel I/O exploits multiple devices and CPU cores. Reading multiple files concurrently saturates bandwidth. Writing to independent directories avoids lock contention. Thread pools or process pools distribute work, maximizing hardware utilization.

Security Implications

Permission Models and Enforcement

Unix permission bits control file access through owner, group, and other categories. Each category has read, write, and execute permissions. Directory permissions control listing contents (read), creating entries (write), and accessing entries (execute/search). Execute permission on directories gates traversal; without it, files within remain inaccessible even if file permissions allow access.

# Setting permissions
File.chmod(0600, 'secret.txt')  # Owner: rw, Group: --, Other: --
File.chmod(0644, 'public.txt')  # Owner: rw, Group: r-, Other: r-
File.chmod(0755, 'script.sh')   # Owner: rwx, Group: r-x, Other: r-x

# Checking permissions
stat = File.stat('file.txt')
stat.readable?          # Check read permission
stat.writable?          # Check write permission
stat.executable?        # Check execute permission

owner_read = (stat.mode & 0400) != 0
group_write = (stat.mode & 0020) != 0
other_exec = (stat.mode & 0001) != 0

Setuid and setgid bits modify execution behavior. Executables with setuid run as file owner regardless of caller identity. This enables privilege escalation for specific operations, like changing passwords (setuid root programs). Setuid risks include privilege abuse and security vulnerabilities in setuid binaries.

Sticky bits on directories restrict deletion. Files in sticky directories (like /tmp) can only be deleted by file owners, directory owners, or root. This prevents users from deleting each other's temporary files despite shared write access to the directory.

Access Control Lists

ACLs extend basic permissions with user and group-specific rules. A file might allow read access to user alice, write access to group developers, and deny access to everyone else. ACLs provide fine-grained control beyond three-category Unix permissions.

# ACL operations require system gem or ffi
require 'ffi'

# Conceptual example of ACL structure
acl = [
  { type: :user, id: 1000, perms: [:read, :write] },
  { type: :group, id: 500, perms: [:read] },
  { type: :other, perms: [] }
]

# Reading ACLs
current_acl = File.get_acl('document.pdf')

# Setting ACLs
File.set_acl('document.pdf', [
  { user: 'alice', perms: 'rw-' },
  { user: 'bob', perms: 'r--' },
  { group: 'editors', perms: 'rw-' }
])

ACL evaluation follows precedence rules. User-specific ACLs override group ACLs. Explicit deny entries prevent access even if other rules grant it. Understanding precedence prevents accidental exposure or unexpected denial.

Symbolic Link Attacks

Symbolic links introduce race conditions in file operations. Time-of-check-to-time-of-use (TOCTOU) attacks exploit windows between checking file properties and operating on files. An attacker replaces a symlink target between check and use, redirecting operations to unintended files.

# Vulnerable pattern (TOCTOU race)
if File.exist?('/tmp/user_file') && File.owned?('/tmp/user_file')
  # Attacker replaces symlink here
  content = File.read('/tmp/user_file')  # May read different file
end

# Safer pattern: open and stat file descriptor
File.open('/tmp/user_file') do |file|
  stat = file.stat
  if stat.owned?
    content = file.read  # Same file we statted
  end
end

# Safest: avoid /tmp, use controlled directories
tmpdir = Dir.mktmpdir('app-')
File.open("#{tmpdir}/user_file") do |file|
  # In controlled directory, no other users
end
FileUtils.remove_entry_secure tmpdir

Following symlinks during privileged operations risks exploitation. Opening files in world-writable directories requires validating parent directory ownership. The O_NOFOLLOW flag prevents following symlinks, failing operations that would traverse links.

Secure File Creation

Race-free file creation uses atomic operations. Opening with O_CREAT | O_EXCL fails if files exist, preventing accidental overwrites. Temporary files need unpredictable names and secure permissions.

require 'tempfile'

# Secure temporary file
Tempfile.create('upload-') do |tmp|
  # File created with 0600 permissions
  # Name unpredictable
  tmp.write(user_data)
  process_file(tmp.path)
  # Automatic cleanup on block exit
end

# Manual secure creation
require 'securerandom'
tmp_name = "temp-#{SecureRandom.hex(16)}"
tmp_path = "/secure/dir/#{tmp_name}"

File.open(tmp_path, File::CREAT | File::EXCL | File::WRONLY, 0600) do |file|
  file.write(data)
end

World-writable files risk unauthorized modification. Attackers inject malicious content or replace files entirely. Applications should minimize writable permissions, granting write access only to authorized users or groups. Validating file integrity through checksums detects tampering.

File System Isolation

Chroot jails confine processes to directory subtrees, preventing access outside jails. Root within jail sees jail root as system root. Chroot provides basic isolation but escapes exist for root processes. Containers improve isolation with namespace and cgroup mechanisms.

# Chroot operation (requires root)
Dir.chroot('/jail/path')
Dir.chdir('/')
# Process now sees /jail/path as /

# Safer: run in container with mounted volumes
# Docker/Podman mount specific directories
# Process sees only mounted volumes
system('docker run -v /data:/app/data image:tag')

Mount namespaces isolate mount points. Processes in separate namespaces see different file system trees. Unprivileged containers mount file systems without affecting host or other containers. This enables secure multi-tenancy without chroot escapes.

Reference

File Operation Methods

Method	Description	Return Value
File.read	Read entire file	String content
File.write	Write string to file	Bytes written
File.open	Open file, yield block	File object
File.stat	Get file metadata	File::Stat object
File.size	Get file size	Integer bytes
File.exist?	Check existence	Boolean
File.directory?	Check if directory	Boolean
File.file?	Check if regular file	Boolean
File.rename	Move or rename file	0 on success
File.delete	Remove file	Integer count
File.chmod	Change permissions	Integer count
File.chown	Change ownership	Integer count

Directory Operation Methods

Method	Description	Return Value
Dir.entries	List all entries	Array of strings
Dir.children	List without . and ..	Array of strings
Dir.glob	Pattern matching	Array of paths
Dir.foreach	Iterate entries	nil
Dir.mkdir	Create directory	0 on success
Dir.pwd	Current directory	String path
Dir.chdir	Change directory	0 on success
Dir.exist?	Check existence	Boolean
Dir.empty?	Check if empty	Boolean

File Mode Flags

Flag	Description	Numeric Value
r	Read only	-
w	Write only, truncate	-
a	Write only, append	-
r+	Read and write	-
w+	Read and write, truncate	-
a+	Read and write, append	-
b	Binary mode	-
t	Text mode	-

Permission Bit Masks

Permission	Owner	Group	Other
Read	0400	0040	0004
Write	0200	0020	0002
Execute	0100	0010	0001
Setuid	04000	-	-
Setgid	02000	-	-
Sticky	01000	-	-

File::Stat Predicates

Method	Returns True If
file?	Regular file
directory?	Directory
symlink?	Symbolic link
socket?	Socket file
pipe?	Named pipe
blockdev?	Block device
chardev?	Character device
readable?	Readable by current process
writable?	Writable by current process
executable?	Executable by current process
owned?	Owned by current process
zero?	Empty file

FileUtils Methods

Method	Description	Shell Equivalent
cp	Copy file	cp
cp_r	Copy recursively	cp -r
mv	Move or rename	mv
rm	Remove file	rm
rm_rf	Remove recursively, force	rm -rf
mkdir	Create directory	mkdir
mkdir_p	Create nested directories	mkdir -p
touch	Update timestamp or create	touch
ln	Create hard link	ln
ln_s	Create symbolic link	ln -s
chmod	Change permissions	chmod
chown	Change ownership	chown
pwd	Print working directory	pwd

Common Glob Patterns

Pattern	Matches
*	Any characters except /
**	Any characters including /
?	Single character
[abc]	One of a, b, c
[a-z]	Character range
{jpg,png}	Alternative extensions
*/.rb	All Ruby files recursively

Performance Optimization Checklist

Technique	Benefit	Use Case
Buffer I/O operations	Reduce system calls	Small reads/writes
Use memory mapping	Eliminate syscall overhead	Random access
Align to block boundaries	Avoid partial block I/O	Large transfers
Enable read-ahead	Hide latency	Sequential reads
Batch metadata operations	Reduce sync overhead	Creating many files
Cache file handles	Avoid repeated opens	Frequently accessed files
Use appropriate block sizes	Balance fragmentation vs overhead	Format decisions

Security Best Practices

Practice	Protects Against	Implementation
Validate path traversal	Directory escape	Check for .. components
Use O_NOFOLLOW	Symlink attacks	Pass flag to open
Create temp files securely	Race conditions	Use Tempfile or mkstemp
Set minimal permissions	Unauthorized access	chmod 0600 for private files
Avoid world-writable directories	File replacement	Use per-user directories
Verify file ownership	Privilege escalation	Check stat.uid
Use file locking	Concurrent modification	flock or lockf
Validate after stat	TOCTOU races	Stat file descriptor, not path

File System Architecture