Overview
The Find module provides methods for traversing directory trees in Ruby. This module implements depth-first directory traversal, visiting each file and directory in a systematic manner. Ruby's Find module offers a simple interface for walking file system hierarchies without requiring manual recursion or complex directory handling logic.
The module's primary method, Find.find
, accepts one or more starting paths and yields each discovered file and directory to a block. The traversal follows symbolic links by default and processes directories before their contents. Ruby loads the Find module through require 'find'
since it belongs to the standard library.
require 'find'
# Basic directory traversal
Find.find('/home/user/documents') do |path|
puts path
end
Find module handles various file system types and automatically manages the traversal stack. The module yields paths as strings, allowing blocks to perform filtering, processing, or collection operations on discovered files and directories.
# Collecting specific file types
ruby_files = []
Find.find('/project/src') do |path|
ruby_files << path if path.end_with?('.rb')
end
The module integrates with Ruby's file and directory classes, enabling complex file system operations during traversal. Find operations can span multiple starting directories and handle mixed file system types within a single traversal operation.
# Multiple starting points
Find.find('/etc', '/var/log', '/home/user') do |path|
# Process files from all three locations
File.stat(path) # Access file metadata during traversal
end
Basic Usage
The Find.find
method accepts directory paths as arguments and yields each discovered path to the provided block. The method processes paths in depth-first order, visiting directories before exploring their contents.
require 'find'
# Simple file listing
Find.find('/usr/bin') do |path|
puts "Found: #{path}"
puts " Type: #{File.directory?(path) ? 'directory' : 'file'}"
puts " Size: #{File.size(path)} bytes" unless File.directory?(path)
end
The traversal process follows symbolic links by default, potentially visiting the same file multiple times if multiple links point to the same target. Use file system checks within the block to handle link behavior appropriately.
# Handling symbolic links
Find.find('/var') do |path|
if File.symlink?(path)
puts "Link: #{path} -> #{File.readlink(path)}"
elsif File.directory?(path)
puts "Directory: #{path}"
else
puts "File: #{path}"
end
end
The module provides Find.prune
method to skip directory traversal for the current path. Call Find.prune
within the block to prevent descent into specific directories, improving performance and avoiding unwanted paths.
# Skipping hidden directories
Find.find('/home/user') do |path|
if File.directory?(path) && File.basename(path).start_with?('.')
Find.prune # Skip hidden directories
else
puts path
end
end
Multiple directory arguments allow traversal across different file system locations in a single operation. The module processes each starting directory completely before moving to the next argument.
# Processing multiple directories
log_files = []
Find.find('/var/log', '/usr/local/log', '/opt/app/logs') do |path|
if File.file?(path) && path.match?(/\.log$/i)
log_files << path
end
end
puts "Found #{log_files.size} log files"
Error Handling & Debugging
File system operations during Find traversal can raise various exceptions. Permission errors occur when accessing restricted directories or files, while path-related errors arise from broken symbolic links or deleted files during traversal.
require 'find'
# Comprehensive error handling
errors = []
processed = 0
Find.find('/') do |path|
begin
stat = File.stat(path)
processed += 1
# Process file based on type
case stat.ftype
when 'file'
puts "File: #{path} (#{stat.size} bytes)"
when 'directory'
puts "Directory: #{path}"
when 'link'
puts "Link: #{path}"
end
rescue Errno::ENOENT
errors << "Path not found: #{path}"
rescue Errno::EACCES
errors << "Permission denied: #{path}"
rescue Errno::ELOOP
errors << "Too many symbolic links: #{path}"
rescue StandardError => e
errors << "Unexpected error for #{path}: #{e.message}"
end
end
puts "Processed #{processed} items with #{errors.size} errors"
errors.each { |error| puts "ERROR: #{error}" }
Broken symbolic links present specific challenges during traversal. The Find module follows links and attempts to stat their targets, raising Errno::ENOENT
when link targets no longer exist.
# Handling broken symbolic links
broken_links = []
Find.find('/usr/local') do |path|
if File.symlink?(path)
begin
File.stat(path) # Test link target accessibility
puts "Valid link: #{path}"
rescue Errno::ENOENT
broken_links << path
puts "Broken link: #{path}"
end
end
end
puts "Found #{broken_links.size} broken symbolic links"
Large directory traversals can fail partway through due to changing file system conditions. Implement checkpoint recovery to resume traversal from specific points rather than restarting complete operations.
# Checkpoint-based traversal with recovery
class FindWithCheckpoints
def initialize(checkpoint_file = 'find_checkpoint.txt')
@checkpoint_file = checkpoint_file
@processed = load_checkpoint
end
def traverse(start_paths)
Find.find(*start_paths) do |path|
next if @processed.include?(path)
begin
yield path
@processed << path
# Save checkpoint every 1000 items
save_checkpoint if @processed.size % 1000 == 0
rescue => e
puts "Error processing #{path}: #{e.message}"
# Continue with next path
end
end
# Clean up checkpoint file on successful completion
File.delete(@checkpoint_file) if File.exist?(@checkpoint_file)
end
private
def load_checkpoint
return Set.new unless File.exist?(@checkpoint_file)
File.readlines(@checkpoint_file, chomp: true).to_set
end
def save_checkpoint
File.write(@checkpoint_file, @processed.to_a.join("\n"))
end
end
# Usage with checkpoint recovery
finder = FindWithCheckpoints.new
finder.traverse(['/large/directory/tree']) do |path|
# Perform time-consuming operations
process_file(path)
end
Performance & Memory
Find module performance depends heavily on file system characteristics and traversal patterns. Large directory trees with many small files create different performance profiles than trees with fewer, larger files.
require 'find'
require 'benchmark'
# Performance measurement for different strategies
def measure_find_performance(path)
Benchmark.bm(20) do |x|
# Standard traversal
x.report("Standard find:") do
count = 0
Find.find(path) { |p| count += 1 }
puts " Found #{count} items"
end
# Filtered traversal
x.report("Filtered find:") do
count = 0
Find.find(path) do |p|
if File.directory?(p) && File.basename(p).start_with?('.')
Find.prune
else
count += 1
end
end
puts " Found #{count} items"
end
# Stat-heavy operations
x.report("With file stats:") do
total_size = 0
Find.find(path) do |p|
total_size += File.size(p) rescue 0
end
puts " Total size: #{total_size} bytes"
end
end
end
measure_find_performance('/usr/share')
Memory usage remains constant during Find operations since the module processes one path at a time rather than loading entire directory structures. However, collecting results in arrays or hashes can consume significant memory for large traversals.
# Memory-efficient file processing
def process_large_directory(start_path)
file_count = 0
total_size = 0
largest_file = { path: nil, size: 0 }
Find.find(start_path) do |path|
next if File.directory?(path)
file_count += 1
size = File.size(path) rescue 0
total_size += size
if size > largest_file[:size]
largest_file = { path: path, size: size }
end
# Report progress without storing paths
puts "Progress: #{file_count} files processed" if file_count % 10000 == 0
end
{
file_count: file_count,
total_size: total_size,
largest_file: largest_file,
average_size: total_size / file_count
}
end
stats = process_large_directory('/home')
puts "Files: #{stats[:file_count]}"
puts "Total: #{stats[:total_size]} bytes"
puts "Average: #{stats[:average_size]} bytes per file"
puts "Largest: #{stats[:largest_file][:path]} (#{stats[:largest_file][:size]} bytes)"
Strategic use of Find.prune
dramatically improves performance by avoiding unnecessary directory descents. Combine pruning with early filtering to minimize file system operations.
# Optimized traversal with strategic pruning
class OptimizedFinder
SKIP_DIRECTORIES = %w[.git node_modules .svn .hg __pycache__ .DS_Store].freeze
SKIP_PATTERNS = [/\.tmp$/, /\.cache$/, /\.backup$/].freeze
def find_source_files(start_paths)
source_files = []
Find.find(*start_paths) do |path|
basename = File.basename(path)
# Skip known unproductive directories
if File.directory?(path) && SKIP_DIRECTORIES.include?(basename)
Find.prune
next
end
# Skip files matching problematic patterns
if SKIP_PATTERNS.any? { |pattern| path.match?(pattern) }
next
end
# Collect source files
if File.file?(path) && path.match?(/\.(rb|py|js|cpp|h)$/)
source_files << {
path: path,
size: File.size(path),
modified: File.mtime(path)
}
end
end
source_files
end
end
finder = OptimizedFinder.new
files = finder.find_source_files(['/project/src', '/project/lib'])
puts "Found #{files.size} source files"
Production Patterns
Production applications often require robust file discovery with logging, monitoring, and error recovery. Find module operations should integrate with application logging systems and provide operational visibility.
require 'find'
require 'logger'
require 'json'
class ProductionFileFinder
def initialize(logger: Logger.new(STDOUT))
@logger = logger
@stats = {
files_processed: 0,
directories_processed: 0,
errors: 0,
start_time: nil,
end_time: nil
}
end
def find_and_process(paths, &processor)
@stats[:start_time] = Time.now
@logger.info("Starting file discovery", paths: paths)
begin
Find.find(*paths) do |path|
process_path(path, &processor)
end
rescue => e
@logger.error("Fatal error during traversal", error: e.message)
raise
ensure
@stats[:end_time] = Time.now
log_completion_stats
end
@stats
end
private
def process_path(path, &processor)
begin
if File.directory?(path)
handle_directory(path)
else
handle_file(path, &processor)
end
rescue => e
@stats[:errors] += 1
@logger.warn("Error processing path", path: path, error: e.message)
end
end
def handle_directory(path)
@stats[:directories_processed] += 1
# Skip problematic directories in production
basename = File.basename(path)
if basename.start_with?('.') && basename != '.'
@logger.debug("Skipping hidden directory", path: path)
Find.prune
end
end
def handle_file(path, &processor)
@stats[:files_processed] += 1
# Log progress periodically
if @stats[:files_processed] % 5000 == 0
@logger.info("Processing progress", files: @stats[:files_processed])
end
# Execute custom processing
processor&.call(path)
end
def log_completion_stats
duration = @stats[:end_time] - @stats[:start_time]
@logger.info("File discovery completed",
duration: duration,
files_processed: @stats[:files_processed],
directories_processed: @stats[:directories_processed],
errors: @stats[:errors],
files_per_second: (@stats[:files_processed] / duration).round(2))
end
end
# Production usage with monitoring
logger = Logger.new('/var/log/file_processor.log')
finder = ProductionFileFinder.new(logger: logger)
# Process configuration files across multiple locations
config_files = []
stats = finder.find_and_process(['/etc', '/usr/local/etc', '/opt/app/config']) do |path|
if path.match?(/\.(conf|cfg|ini|yaml|json)$/i)
config_files << path
logger.info("Found config file", path: path, size: File.size(path))
end
end
puts "Discovered #{config_files.size} configuration files"
puts "Processing completed in #{stats[:end_time] - stats[:start_time]} seconds"
Application deployment scenarios often require file synchronization and validation. Find module supports comparing directory structures and identifying changes between deployments.
# Deployment validation with Find module
class DeploymentValidator
def initialize(expected_files_manifest)
@expected_files = Set.new(expected_files_manifest)
@found_files = Set.new
@extra_files = Set.new
@missing_files = Set.new
end
def validate_deployment(deployment_path)
# Discover all files in deployment
Find.find(deployment_path) do |path|
next if File.directory?(path)
relative_path = path.sub("#{deployment_path}/", '')
@found_files << relative_path
unless @expected_files.include?(relative_path)
@extra_files << relative_path
end
end
# Identify missing files
@missing_files = @expected_files - @found_files
{
valid: @missing_files.empty? && @extra_files.empty?,
found_count: @found_files.size,
expected_count: @expected_files.size,
missing_files: @missing_files.to_a,
extra_files: @extra_files.to_a
}
end
end
# Load expected files from deployment manifest
expected_files = File.readlines('deployment_manifest.txt', chomp: true)
validator = DeploymentValidator.new(expected_files)
# Validate current deployment
result = validator.validate_deployment('/opt/application')
if result[:valid]
puts "Deployment validation passed"
else
puts "Deployment validation failed:"
puts "Missing files: #{result[:missing_files]}"
puts "Extra files: #{result[:extra_files]}"
exit 1
end
Common Pitfalls
Symbolic link handling creates the most common Find module pitfalls. The module follows symbolic links by default, potentially creating infinite loops when links form cycles or point to parent directories.
# Dangerous: Can create infinite loops
# Find.find('/') do |path|
# puts path # May never complete due to symbolic link cycles
# end
# Safe: Detect and handle symbolic link cycles
visited_inodes = Set.new
Find.find('/home/user') do |path|
begin
stat = File.lstat(path) # Use lstat to get link info, not target
inode_key = "#{stat.dev}:#{stat.ino}"
if visited_inodes.include?(inode_key)
puts "Cycle detected: #{path}"
Find.prune if File.directory?(path)
next
end
visited_inodes << inode_key
puts path
rescue => e
puts "Error accessing #{path}: #{e.message}"
end
end
Permission errors during traversal can halt processing unexpectedly. Production code must handle permission failures gracefully while continuing to process accessible paths.
# Problematic: Unhandled permission errors stop traversal
# Find.find('/') do |path|
# File.read(path) if File.file?(path) # Fails on restricted files
# end
# Better: Graceful permission handling
accessible_files = []
permission_errors = []
Find.find('/var') do |path|
next if File.directory?(path)
begin
# Test file accessibility before processing
File.readable?(path) ? accessible_files << path : permission_errors << path
rescue Errno::EACCES
permission_errors << path
end
end
puts "Accessible: #{accessible_files.size}, Restricted: #{permission_errors.size}"
Path modification during traversal creates race conditions and unexpected behavior. Files deleted or moved during Find operations may cause errors or missing results.
# Race condition: Directory contents change during traversal
class SafeTraversal
def initialize
@processing_errors = []
@snapshot_time = Time.now
end
def find_with_stability_check(paths)
results = []
Find.find(*paths) do |path|
begin
# Check if file still exists and unchanged since traversal start
stat = File.stat(path)
if stat.mtime > @snapshot_time
puts "Warning: #{path} modified during traversal"
end
results << {
path: path,
size: stat.size,
modified: stat.mtime,
processed_at: Time.now
}
rescue Errno::ENOENT
@processing_errors << "File disappeared: #{path}"
rescue => e
@processing_errors << "Error processing #{path}: #{e.message}"
end
end
{ results: results, errors: @processing_errors }
end
end
traversal = SafeTraversal.new
data = traversal.find_with_stability_check(['/tmp'])
puts "Found #{data[:results].size} files with #{data[:errors].size} errors"
String encoding issues arise when file paths contain non-UTF-8 characters. Find module returns paths as strings with potentially mixed encodings, causing comparison and processing failures.
# Encoding-aware path processing
def process_paths_safely(start_paths)
valid_paths = []
encoding_errors = []
Find.find(*start_paths) do |path|
begin
# Force UTF-8 encoding and validate
utf8_path = path.force_encoding('UTF-8')
unless utf8_path.valid_encoding?
# Handle non-UTF-8 paths
utf8_path = path.encode('UTF-8', 'binary',
invalid: :replace,
undef: :replace,
replace: '?')
encoding_errors << "Encoding fixed: #{path.inspect} -> #{utf8_path}"
end
valid_paths << utf8_path
rescue => e
encoding_errors << "Encoding error for #{path.inspect}: #{e.message}"
end
end
{ paths: valid_paths, encoding_errors: encoding_errors }
end
result = process_paths_safely(['/mixed/encoding/directory'])
puts "Valid paths: #{result[:paths].size}"
puts "Encoding issues: #{result[:encoding_errors].size}"
Reference
Core Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Find.find(*paths) |
paths (String...) |
Enumerator or yields paths |
Traverses directories depth-first, yielding each path |
Find.prune |
None | nil |
Skips current directory during traversal |
Module Constants
Constant | Value | Description |
---|---|---|
Find::VERSION |
String | Module version identifier |
Block Yielded Values
The Find.find
method yields string paths to the block in depth-first order:
Path Type | Example | Characteristics |
---|---|---|
Starting directory | /home/user |
First yielded path |
Subdirectory | /home/user/documents |
Yielded before contents |
Regular file | /home/user/documents/file.txt |
Leaf nodes in traversal |
Symbolic link | /home/user/link |
Followed by default |
Error Handling Patterns
Common exceptions during Find operations:
Exception | Cause | Handling Strategy |
---|---|---|
Errno::ENOENT |
Path not found, broken symlink | Skip and continue |
Errno::EACCES |
Permission denied | Log and continue |
Errno::ELOOP |
Symbolic link loop | Detect cycles, prune |
Errno::ENAMETOOLONG |
Path too long | Truncate or skip |
SystemCallError |
Generic system error | Log and continue |
Performance Characteristics
Operation | Time Complexity | Memory Usage | Notes |
---|---|---|---|
Directory traversal | O(n) where n = total paths | O(1) | Constant memory per path |
Find.prune |
O(1) | O(1) | Immediate directory skip |
Symbolic link following | O(link depth) | O(1) | Additional stat calls |
Integration Patterns
Common usage patterns with other Ruby classes:
Pattern | Example | Use Case |
---|---|---|
File filtering | File.extname(path) == '.rb' |
Type-based selection |
Size checking | File.size(path) > 1024 |
Size-based filtering |
Time comparison | File.mtime(path) > 1.day.ago |
Temporal filtering |
Permission testing | File.readable?(path) |
Access validation |
Directory operations | FileUtils.mkdir_p(path) |
Structure modification |
Find Module Workflow
require 'find'
# Complete traversal pattern
Find.find('/start/path') do |path|
# 1. Path yielded as string
# 2. Perform file system tests
# 3. Process or collect results
# 4. Optionally call Find.prune
# 5. Handle exceptions appropriately
end