CrackedRuby logo

CrackedRuby

Find Module

Comprehensive guide to Ruby's Find module for directory tree traversal and file system searching operations.

Standard Library File Utilities
4.6.2

Overview

The Find module provides methods for traversing directory trees in Ruby. This module implements depth-first directory traversal, visiting each file and directory in a systematic manner. Ruby's Find module offers a simple interface for walking file system hierarchies without requiring manual recursion or complex directory handling logic.

The module's primary method, Find.find, accepts one or more starting paths and yields each discovered file and directory to a block. The traversal follows symbolic links by default and processes directories before their contents. Ruby loads the Find module through require 'find' since it belongs to the standard library.

require 'find'

# Basic directory traversal
Find.find('/home/user/documents') do |path|
  puts path
end

Find module handles various file system types and automatically manages the traversal stack. The module yields paths as strings, allowing blocks to perform filtering, processing, or collection operations on discovered files and directories.

# Collecting specific file types
ruby_files = []
Find.find('/project/src') do |path|
  ruby_files << path if path.end_with?('.rb')
end

The module integrates with Ruby's file and directory classes, enabling complex file system operations during traversal. Find operations can span multiple starting directories and handle mixed file system types within a single traversal operation.

# Multiple starting points
Find.find('/etc', '/var/log', '/home/user') do |path|
  # Process files from all three locations
  File.stat(path) # Access file metadata during traversal
end

Basic Usage

The Find.find method accepts directory paths as arguments and yields each discovered path to the provided block. The method processes paths in depth-first order, visiting directories before exploring their contents.

require 'find'

# Simple file listing
Find.find('/usr/bin') do |path|
  puts "Found: #{path}"
  puts "  Type: #{File.directory?(path) ? 'directory' : 'file'}"
  puts "  Size: #{File.size(path)} bytes" unless File.directory?(path)
end

The traversal process follows symbolic links by default, potentially visiting the same file multiple times if multiple links point to the same target. Use file system checks within the block to handle link behavior appropriately.

# Handling symbolic links
Find.find('/var') do |path|
  if File.symlink?(path)
    puts "Link: #{path} -> #{File.readlink(path)}"
  elsif File.directory?(path)
    puts "Directory: #{path}"
  else
    puts "File: #{path}"
  end
end

The module provides Find.prune method to skip directory traversal for the current path. Call Find.prune within the block to prevent descent into specific directories, improving performance and avoiding unwanted paths.

# Skipping hidden directories
Find.find('/home/user') do |path|
  if File.directory?(path) && File.basename(path).start_with?('.')
    Find.prune # Skip hidden directories
  else
    puts path
  end
end

Multiple directory arguments allow traversal across different file system locations in a single operation. The module processes each starting directory completely before moving to the next argument.

# Processing multiple directories
log_files = []
Find.find('/var/log', '/usr/local/log', '/opt/app/logs') do |path|
  if File.file?(path) && path.match?(/\.log$/i)
    log_files << path
  end
end

puts "Found #{log_files.size} log files"

Error Handling & Debugging

File system operations during Find traversal can raise various exceptions. Permission errors occur when accessing restricted directories or files, while path-related errors arise from broken symbolic links or deleted files during traversal.

require 'find'

# Comprehensive error handling
errors = []
processed = 0

Find.find('/') do |path|
  begin
    stat = File.stat(path)
    processed += 1
    
    # Process file based on type
    case stat.ftype
    when 'file'
      puts "File: #{path} (#{stat.size} bytes)"
    when 'directory'
      puts "Directory: #{path}"
    when 'link'
      puts "Link: #{path}"
    end
    
  rescue Errno::ENOENT
    errors << "Path not found: #{path}"
  rescue Errno::EACCES
    errors << "Permission denied: #{path}"
  rescue Errno::ELOOP
    errors << "Too many symbolic links: #{path}"
  rescue StandardError => e
    errors << "Unexpected error for #{path}: #{e.message}"
  end
end

puts "Processed #{processed} items with #{errors.size} errors"
errors.each { |error| puts "ERROR: #{error}" }

Broken symbolic links present specific challenges during traversal. The Find module follows links and attempts to stat their targets, raising Errno::ENOENT when link targets no longer exist.

# Handling broken symbolic links
broken_links = []

Find.find('/usr/local') do |path|
  if File.symlink?(path)
    begin
      File.stat(path) # Test link target accessibility
      puts "Valid link: #{path}"
    rescue Errno::ENOENT
      broken_links << path
      puts "Broken link: #{path}"
    end
  end
end

puts "Found #{broken_links.size} broken symbolic links"

Large directory traversals can fail partway through due to changing file system conditions. Implement checkpoint recovery to resume traversal from specific points rather than restarting complete operations.

# Checkpoint-based traversal with recovery
class FindWithCheckpoints
  def initialize(checkpoint_file = 'find_checkpoint.txt')
    @checkpoint_file = checkpoint_file
    @processed = load_checkpoint
  end
  
  def traverse(start_paths)
    Find.find(*start_paths) do |path|
      next if @processed.include?(path)
      
      begin
        yield path
        @processed << path
        
        # Save checkpoint every 1000 items
        save_checkpoint if @processed.size % 1000 == 0
        
      rescue => e
        puts "Error processing #{path}: #{e.message}"
        # Continue with next path
      end
    end
    
    # Clean up checkpoint file on successful completion
    File.delete(@checkpoint_file) if File.exist?(@checkpoint_file)
  end
  
  private
  
  def load_checkpoint
    return Set.new unless File.exist?(@checkpoint_file)
    File.readlines(@checkpoint_file, chomp: true).to_set
  end
  
  def save_checkpoint
    File.write(@checkpoint_file, @processed.to_a.join("\n"))
  end
end

# Usage with checkpoint recovery
finder = FindWithCheckpoints.new
finder.traverse(['/large/directory/tree']) do |path|
  # Perform time-consuming operations
  process_file(path)
end

Performance & Memory

Find module performance depends heavily on file system characteristics and traversal patterns. Large directory trees with many small files create different performance profiles than trees with fewer, larger files.

require 'find'
require 'benchmark'

# Performance measurement for different strategies
def measure_find_performance(path)
  Benchmark.bm(20) do |x|
    # Standard traversal
    x.report("Standard find:") do
      count = 0
      Find.find(path) { |p| count += 1 }
      puts "  Found #{count} items"
    end
    
    # Filtered traversal
    x.report("Filtered find:") do
      count = 0
      Find.find(path) do |p|
        if File.directory?(p) && File.basename(p).start_with?('.')
          Find.prune
        else
          count += 1
        end
      end
      puts "  Found #{count} items"
    end
    
    # Stat-heavy operations
    x.report("With file stats:") do
      total_size = 0
      Find.find(path) do |p|
        total_size += File.size(p) rescue 0
      end
      puts "  Total size: #{total_size} bytes"
    end
  end
end

measure_find_performance('/usr/share')

Memory usage remains constant during Find operations since the module processes one path at a time rather than loading entire directory structures. However, collecting results in arrays or hashes can consume significant memory for large traversals.

# Memory-efficient file processing
def process_large_directory(start_path)
  file_count = 0
  total_size = 0
  largest_file = { path: nil, size: 0 }
  
  Find.find(start_path) do |path|
    next if File.directory?(path)
    
    file_count += 1
    size = File.size(path) rescue 0
    total_size += size
    
    if size > largest_file[:size]
      largest_file = { path: path, size: size }
    end
    
    # Report progress without storing paths
    puts "Progress: #{file_count} files processed" if file_count % 10000 == 0
  end
  
  {
    file_count: file_count,
    total_size: total_size,
    largest_file: largest_file,
    average_size: total_size / file_count
  }
end

stats = process_large_directory('/home')
puts "Files: #{stats[:file_count]}"
puts "Total: #{stats[:total_size]} bytes"
puts "Average: #{stats[:average_size]} bytes per file"
puts "Largest: #{stats[:largest_file][:path]} (#{stats[:largest_file][:size]} bytes)"

Strategic use of Find.prune dramatically improves performance by avoiding unnecessary directory descents. Combine pruning with early filtering to minimize file system operations.

# Optimized traversal with strategic pruning
class OptimizedFinder
  SKIP_DIRECTORIES = %w[.git node_modules .svn .hg __pycache__ .DS_Store].freeze
  SKIP_PATTERNS = [/\.tmp$/, /\.cache$/, /\.backup$/].freeze
  
  def find_source_files(start_paths)
    source_files = []
    
    Find.find(*start_paths) do |path|
      basename = File.basename(path)
      
      # Skip known unproductive directories
      if File.directory?(path) && SKIP_DIRECTORIES.include?(basename)
        Find.prune
        next
      end
      
      # Skip files matching problematic patterns
      if SKIP_PATTERNS.any? { |pattern| path.match?(pattern) }
        next
      end
      
      # Collect source files
      if File.file?(path) && path.match?(/\.(rb|py|js|cpp|h)$/)
        source_files << {
          path: path,
          size: File.size(path),
          modified: File.mtime(path)
        }
      end
    end
    
    source_files
  end
end

finder = OptimizedFinder.new
files = finder.find_source_files(['/project/src', '/project/lib'])
puts "Found #{files.size} source files"

Production Patterns

Production applications often require robust file discovery with logging, monitoring, and error recovery. Find module operations should integrate with application logging systems and provide operational visibility.

require 'find'
require 'logger'
require 'json'

class ProductionFileFinder
  def initialize(logger: Logger.new(STDOUT))
    @logger = logger
    @stats = {
      files_processed: 0,
      directories_processed: 0,
      errors: 0,
      start_time: nil,
      end_time: nil
    }
  end
  
  def find_and_process(paths, &processor)
    @stats[:start_time] = Time.now
    @logger.info("Starting file discovery", paths: paths)
    
    begin
      Find.find(*paths) do |path|
        process_path(path, &processor)
      end
    rescue => e
      @logger.error("Fatal error during traversal", error: e.message)
      raise
    ensure
      @stats[:end_time] = Time.now
      log_completion_stats
    end
    
    @stats
  end
  
  private
  
  def process_path(path, &processor)
    begin
      if File.directory?(path)
        handle_directory(path)
      else
        handle_file(path, &processor)
      end
    rescue => e
      @stats[:errors] += 1
      @logger.warn("Error processing path", path: path, error: e.message)
    end
  end
  
  def handle_directory(path)
    @stats[:directories_processed] += 1
    
    # Skip problematic directories in production
    basename = File.basename(path)
    if basename.start_with?('.') && basename != '.'
      @logger.debug("Skipping hidden directory", path: path)
      Find.prune
    end
  end
  
  def handle_file(path, &processor)
    @stats[:files_processed] += 1
    
    # Log progress periodically
    if @stats[:files_processed] % 5000 == 0
      @logger.info("Processing progress", files: @stats[:files_processed])
    end
    
    # Execute custom processing
    processor&.call(path)
  end
  
  def log_completion_stats
    duration = @stats[:end_time] - @stats[:start_time]
    
    @logger.info("File discovery completed", 
                 duration: duration,
                 files_processed: @stats[:files_processed],
                 directories_processed: @stats[:directories_processed],
                 errors: @stats[:errors],
                 files_per_second: (@stats[:files_processed] / duration).round(2))
  end
end

# Production usage with monitoring
logger = Logger.new('/var/log/file_processor.log')
finder = ProductionFileFinder.new(logger: logger)

# Process configuration files across multiple locations
config_files = []
stats = finder.find_and_process(['/etc', '/usr/local/etc', '/opt/app/config']) do |path|
  if path.match?(/\.(conf|cfg|ini|yaml|json)$/i)
    config_files << path
    logger.info("Found config file", path: path, size: File.size(path))
  end
end

puts "Discovered #{config_files.size} configuration files"
puts "Processing completed in #{stats[:end_time] - stats[:start_time]} seconds"

Application deployment scenarios often require file synchronization and validation. Find module supports comparing directory structures and identifying changes between deployments.

# Deployment validation with Find module
class DeploymentValidator
  def initialize(expected_files_manifest)
    @expected_files = Set.new(expected_files_manifest)
    @found_files = Set.new
    @extra_files = Set.new
    @missing_files = Set.new
  end
  
  def validate_deployment(deployment_path)
    # Discover all files in deployment
    Find.find(deployment_path) do |path|
      next if File.directory?(path)
      
      relative_path = path.sub("#{deployment_path}/", '')
      @found_files << relative_path
      
      unless @expected_files.include?(relative_path)
        @extra_files << relative_path
      end
    end
    
    # Identify missing files
    @missing_files = @expected_files - @found_files
    
    {
      valid: @missing_files.empty? && @extra_files.empty?,
      found_count: @found_files.size,
      expected_count: @expected_files.size,
      missing_files: @missing_files.to_a,
      extra_files: @extra_files.to_a
    }
  end
end

# Load expected files from deployment manifest
expected_files = File.readlines('deployment_manifest.txt', chomp: true)
validator = DeploymentValidator.new(expected_files)

# Validate current deployment
result = validator.validate_deployment('/opt/application')

if result[:valid]
  puts "Deployment validation passed"
else
  puts "Deployment validation failed:"
  puts "Missing files: #{result[:missing_files]}"
  puts "Extra files: #{result[:extra_files]}"
  exit 1
end

Common Pitfalls

Symbolic link handling creates the most common Find module pitfalls. The module follows symbolic links by default, potentially creating infinite loops when links form cycles or point to parent directories.

# Dangerous: Can create infinite loops
# Find.find('/') do |path|
#   puts path  # May never complete due to symbolic link cycles
# end

# Safe: Detect and handle symbolic link cycles
visited_inodes = Set.new

Find.find('/home/user') do |path|
  begin
    stat = File.lstat(path)  # Use lstat to get link info, not target
    inode_key = "#{stat.dev}:#{stat.ino}"
    
    if visited_inodes.include?(inode_key)
      puts "Cycle detected: #{path}"
      Find.prune if File.directory?(path)
      next
    end
    
    visited_inodes << inode_key
    puts path
    
  rescue => e
    puts "Error accessing #{path}: #{e.message}"
  end
end

Permission errors during traversal can halt processing unexpectedly. Production code must handle permission failures gracefully while continuing to process accessible paths.

# Problematic: Unhandled permission errors stop traversal
# Find.find('/') do |path|
#   File.read(path) if File.file?(path)  # Fails on restricted files
# end

# Better: Graceful permission handling
accessible_files = []
permission_errors = []

Find.find('/var') do |path|
  next if File.directory?(path)
  
  begin
    # Test file accessibility before processing
    File.readable?(path) ? accessible_files << path : permission_errors << path
  rescue Errno::EACCES
    permission_errors << path
  end
end

puts "Accessible: #{accessible_files.size}, Restricted: #{permission_errors.size}"

Path modification during traversal creates race conditions and unexpected behavior. Files deleted or moved during Find operations may cause errors or missing results.

# Race condition: Directory contents change during traversal
class SafeTraversal
  def initialize
    @processing_errors = []
    @snapshot_time = Time.now
  end
  
  def find_with_stability_check(paths)
    results = []
    
    Find.find(*paths) do |path|
      begin
        # Check if file still exists and unchanged since traversal start
        stat = File.stat(path)
        
        if stat.mtime > @snapshot_time
          puts "Warning: #{path} modified during traversal"
        end
        
        results << {
          path: path,
          size: stat.size,
          modified: stat.mtime,
          processed_at: Time.now
        }
        
      rescue Errno::ENOENT
        @processing_errors << "File disappeared: #{path}"
      rescue => e
        @processing_errors << "Error processing #{path}: #{e.message}"
      end
    end
    
    { results: results, errors: @processing_errors }
  end
end

traversal = SafeTraversal.new
data = traversal.find_with_stability_check(['/tmp'])
puts "Found #{data[:results].size} files with #{data[:errors].size} errors"

String encoding issues arise when file paths contain non-UTF-8 characters. Find module returns paths as strings with potentially mixed encodings, causing comparison and processing failures.

# Encoding-aware path processing
def process_paths_safely(start_paths)
  valid_paths = []
  encoding_errors = []
  
  Find.find(*start_paths) do |path|
    begin
      # Force UTF-8 encoding and validate
      utf8_path = path.force_encoding('UTF-8')
      
      unless utf8_path.valid_encoding?
        # Handle non-UTF-8 paths
        utf8_path = path.encode('UTF-8', 'binary', 
                               invalid: :replace, 
                               undef: :replace, 
                               replace: '?')
        encoding_errors << "Encoding fixed: #{path.inspect} -> #{utf8_path}"
      end
      
      valid_paths << utf8_path
      
    rescue => e
      encoding_errors << "Encoding error for #{path.inspect}: #{e.message}"
    end
  end
  
  { paths: valid_paths, encoding_errors: encoding_errors }
end

result = process_paths_safely(['/mixed/encoding/directory'])
puts "Valid paths: #{result[:paths].size}"
puts "Encoding issues: #{result[:encoding_errors].size}"

Reference

Core Methods

Method Parameters Returns Description
Find.find(*paths) paths (String...) Enumerator or yields paths Traverses directories depth-first, yielding each path
Find.prune None nil Skips current directory during traversal

Module Constants

Constant Value Description
Find::VERSION String Module version identifier

Block Yielded Values

The Find.find method yields string paths to the block in depth-first order:

Path Type Example Characteristics
Starting directory /home/user First yielded path
Subdirectory /home/user/documents Yielded before contents
Regular file /home/user/documents/file.txt Leaf nodes in traversal
Symbolic link /home/user/link Followed by default

Error Handling Patterns

Common exceptions during Find operations:

Exception Cause Handling Strategy
Errno::ENOENT Path not found, broken symlink Skip and continue
Errno::EACCES Permission denied Log and continue
Errno::ELOOP Symbolic link loop Detect cycles, prune
Errno::ENAMETOOLONG Path too long Truncate or skip
SystemCallError Generic system error Log and continue

Performance Characteristics

Operation Time Complexity Memory Usage Notes
Directory traversal O(n) where n = total paths O(1) Constant memory per path
Find.prune O(1) O(1) Immediate directory skip
Symbolic link following O(link depth) O(1) Additional stat calls

Integration Patterns

Common usage patterns with other Ruby classes:

Pattern Example Use Case
File filtering File.extname(path) == '.rb' Type-based selection
Size checking File.size(path) > 1024 Size-based filtering
Time comparison File.mtime(path) > 1.day.ago Temporal filtering
Permission testing File.readable?(path) Access validation
Directory operations FileUtils.mkdir_p(path) Structure modification

Find Module Workflow

require 'find'

# Complete traversal pattern
Find.find('/start/path') do |path|
  # 1. Path yielded as string
  # 2. Perform file system tests
  # 3. Process or collect results  
  # 4. Optionally call Find.prune
  # 5. Handle exceptions appropriately
end