CrackedRuby CrackedRuby

String Matching with Wildcards

Overview

String matching with wildcards extends basic string comparison by introducing metacharacters that match variable portions of text. The two primary wildcard characters are the asterisk (*) which matches zero or more characters, and the question mark (?) which matches exactly one character. This pattern matching mechanism appears throughout computing systems, from file globbing in shells to search functionality in applications.

The wildcard matching problem requires determining whether a pattern containing wildcards matches a given string. Unlike regular expressions, wildcard patterns use simpler syntax with fewer metacharacters, making them more accessible for end users while remaining computationally interesting for developers.

# Basic wildcard matching concept
pattern = "*.txt"
filename = "document.txt"
# Pattern matches: * represents "document"

pattern = "file?.log"
filename = "file1.log"
# Pattern matches: ? represents "1"

pattern = "test*"
filename = "testing"
# Pattern matches: * represents "ing"

The matching problem becomes non-trivial when patterns contain multiple wildcards or when asterisks need to match variable amounts of text. A pattern like "a*b*c" matching against "aXbYc" requires the algorithm to correctly partition the string to satisfy all pattern components.

Wildcard matching differs from substring search because the entire string must satisfy the pattern structure. The pattern "*test*" matches any string containing "test", but "test*" only matches strings beginning with "test".

Key Principles

Wildcard matching operates on three fundamental components: literal character matching, single-character wildcards, and multi-character wildcards. Literal characters in the pattern must match exactly at their corresponding positions. The question mark wildcard matches any single character but cannot match zero characters or multiple characters. The asterisk wildcard matches any sequence of zero or more characters, providing flexibility in pattern specification.

The matching process evaluates whether a pattern can map to a string such that all literal characters align correctly and wildcards consume appropriate portions of the string. When no wildcards exist, the problem reduces to string equality. With only question marks, the pattern and string must have equal length with matching characters at non-wildcard positions. Asterisks introduce complexity because they can match variable amounts of text, creating multiple potential matches that the algorithm must explore.

The core challenge involves handling asterisk ambiguity. Consider matching "a*b*c" against "aXbYbZc". The first asterisk could match "X" or "XbY", and the second asterisk must adjust accordingly. The algorithm must explore different partitioning strategies to determine if any valid match exists.

Greedy matching, where asterisks consume as much text as possible, does not guarantee correct results. The pattern "*ab*" against "aabab" demonstrates this: a greedy first asterisk matching "aab" would leave "ab" for the second asterisk, but the pattern requires literal "ab" between the wildcards. A correct match requires the first asterisk to match "a" and the second to match "".

Character position plays a critical role in wildcard semantics. Unlike regular expressions with quantifiers, wildcards in string matching typically operate character-by-character. The pattern "a*" matching "abc" means the asterisk consumes "bc", but the pattern tracks this as consuming two characters, not as a grouped match.

Pattern structure determines algorithmic complexity. Patterns without wildcards complete in linear time with direct comparison. Patterns with only question marks require single-pass validation. Patterns with asterisks potentially require exploring multiple match possibilities, leading to exponential worst-case complexity without optimization.

The empty string presents edge cases that algorithms must handle explicitly. An empty pattern matches only an empty string. A pattern consisting solely of asterisks matches any string, including empty strings. A pattern with literals or question marks cannot match an empty string.

Anchoring behavior distinguishes wildcard matching from substring search. The pattern "test" as a wildcard pattern matches only the exact string "test", whereas as a substring search it matches any string containing "test". Wildcard patterns implicitly anchor to both string start and end unless asterisks provide flexibility.

Ruby Implementation

Ruby provides multiple approaches for wildcard matching depending on the use case. The File.fnmatch method implements shell-style wildcard matching, supporting asterisk and question mark wildcards along with additional shell patterns. The Dir.glob method uses wildcard patterns for file system traversal. Converting wildcards to regular expressions offers another implementation strategy with full regex engine power.

The File.fnmatch method accepts a pattern and string, returning a boolean indicating whether the string matches the pattern:

# Basic File.fnmatch usage
File.fnmatch("*.txt", "document.txt")
# => true

File.fnmatch("file?.log", "file1.log")
# => true

File.fnmatch("test*", "testing")
# => true

File.fnmatch("test*", "production")
# => false

File.fnmatch supports flags that modify matching behavior. The File::FNM_CASEFOLD flag enables case-insensitive matching, while File::FNM_PATHNAME prevents asterisks from matching directory separators:

# Case-insensitive matching
File.fnmatch("*.TXT", "document.txt", File::FNM_CASEFOLD)
# => true

# Pathname-aware matching
File.fnmatch("*/*.txt", "docs/file.txt", File::FNM_PATHNAME)
# => true

File.fnmatch("*.txt", "docs/file.txt", File::FNM_PATHNAME)
# => false (asterisk doesn't cross directory separator)

The File::FNM_DOTMATCH flag allows wildcards to match leading dots in filenames, which are typically hidden files in Unix systems:

# Default behavior excludes dotfiles
File.fnmatch("*", ".hidden")
# => false

# FNM_DOTMATCH includes dotfiles
File.fnmatch("*", ".hidden", File::FNM_DOTMATCH)
# => true

Custom wildcard matching implementation in Ruby demonstrates the algorithmic approach. A recursive solution explores different match possibilities when encountering wildcards:

def wildcard_match(pattern, string)
  return string.empty? if pattern.empty?
  return false if string.empty? && pattern != "*" * pattern.length
  
  if pattern[0] == '?'
    return false if string.empty?
    return wildcard_match(pattern[1..-1], string[1..-1])
  elsif pattern[0] == '*'
    # Try matching asterisk to zero characters
    return true if wildcard_match(pattern[1..-1], string)
    # Try matching asterisk to one or more characters
    return !string.empty? && wildcard_match(pattern, string[1..-1])
  else
    # Literal character must match
    return false if string.empty? || pattern[0] != string[0]
    return wildcard_match(pattern[1..-1], string[1..-1])
  end
end

wildcard_match("a*b*c", "aXbYc")
# => true

wildcard_match("*.rb", "script.rb")
# => true

Dynamic programming improves performance for patterns with multiple asterisks by caching subproblem results:

def wildcard_match_dp(pattern, string)
  m, n = pattern.length, string.length
  dp = Array.new(m + 1) { Array.new(n + 1, false) }
  
  # Empty pattern matches empty string
  dp[0][0] = true
  
  # Pattern with only asterisks can match empty string
  (1..m).each do |i|
    dp[i][0] = dp[i-1][0] && pattern[i-1] == '*'
  end
  
  (1..m).each do |i|
    (1..n).each do |j|
      if pattern[i-1] == '*'
        # Match asterisk to zero characters or one+ characters
        dp[i][j] = dp[i-1][j] || dp[i][j-1]
      elsif pattern[i-1] == '?' || pattern[i-1] == string[j-1]
        # Match single character wildcard or literal
        dp[i][j] = dp[i-1][j-1]
      end
    end
  end
  
  dp[m][n]
end

wildcard_match_dp("a*b*c", "aXbYbZc")
# => true

Converting wildcard patterns to regular expressions enables using Ruby's regex engine:

def wildcard_to_regex(pattern)
  regex_pattern = Regexp.escape(pattern)
  regex_pattern.gsub!('\*', '.*')
  regex_pattern.gsub!('\?', '.')
  Regexp.new("\\A#{regex_pattern}\\z")
end

regex = wildcard_to_regex("test*.txt")
regex.match?("test_file.txt")
# => true

regex.match?("testing.txt")
# => true

The Dir.glob method uses wildcard patterns for file system operations, supporting both asterisk and double-asterisk for recursive directory matching:

# Match all Ruby files in current directory
Dir.glob("*.rb")

# Match Ruby files in any subdirectory
Dir.glob("**/*.rb")

# Match files with specific character in name
Dir.glob("file?.txt")

# Combine patterns with braces
Dir.glob("*.{rb,py}")

Implementation Approaches

Several algorithmic strategies solve the wildcard matching problem, each with distinct performance characteristics and implementation complexity. The choice of approach depends on pattern complexity, string length, and performance requirements.

Recursive backtracking explores match possibilities by trying different ways to consume wildcard characters. When encountering an asterisk, the algorithm attempts matching it to zero characters first, then recursively tries matching one character at a time. This approach naturally handles multiple wildcards and complex patterns but may revisit the same subproblems multiple times.

The recursive approach maintains simplicity but suffers from exponential time complexity in worst cases. A pattern like "a*a*a*a*b" against a string of many 'a' characters followed by 'c' will explore numerous failing match attempts before determining no match exists.

Dynamic programming eliminates redundant computation by building a table of subproblem results. The table entry dp[i][j] indicates whether pattern[0..i-1] matches string[0..j-1]. The algorithm fills the table iteratively, with each entry depending only on previously computed values.

Dynamic programming guarantees polynomial time complexity, specifically O(mn) where m is pattern length and n is string length. The space complexity is also O(mn) for the table, though optimization can reduce space to O(n) by maintaining only two rows.

Greedy matching with backtracking offers a middle ground. The algorithm processes the pattern greedily, with asterisks initially matching as much as possible. When a mismatch occurs, the algorithm backtracks to the most recent asterisk and tries a shorter match. This approach performs well on typical inputs while handling worst cases better than pure greedy matching.

Two-pointer techniques provide efficient solutions for restricted pattern types. When patterns contain a single asterisk, two pointers can scan from both string ends, matching prefix and suffix portions separately. This linear-time approach handles common cases like "prefix*suffix" efficiently.

Finite automata compilation converts patterns into state machines that process strings in linear time after construction. The pattern becomes a deterministic or non-deterministic finite automaton where states represent pattern positions and transitions represent character matches or wildcard consumption. This approach excels when matching many strings against the same pattern.

Hybrid approaches combine techniques based on pattern structure analysis. Patterns without wildcards use direct comparison. Patterns with only question marks use single-pass validation. Complex patterns with multiple asterisks employ dynamic programming or backtracking as needed.

Practical Examples

File filtering represents a primary application of wildcard matching. Applications use wildcard patterns to select files based on naming conventions:

def filter_files(directory, pattern)
  Dir.entries(directory).select do |filename|
    File.fnmatch(pattern, filename)
  end
end

# Find all text files
filter_files("/docs", "*.txt")
# => ["report.txt", "notes.txt"]

# Find log files from specific date
filter_files("/logs", "2024-01-*.log")
# => ["2024-01-15.log", "2024-01-16.log"]

# Find temporary files
filter_files("/tmp", "temp_?.dat")
# => ["temp_1.dat", "temp_2.dat"]

Search functionality in text editors and IDEs employs wildcards for flexible file discovery:

class FileSearch
  def initialize(root_path)
    @root_path = root_path
  end
  
  def find_files(pattern, recursive: false)
    glob_pattern = recursive ? "**/#{pattern}" : pattern
    Dir.glob(File.join(@root_path, glob_pattern))
  end
  
  def find_by_extension(extension)
    find_files("*.#{extension}", recursive: true)
  end
  
  def find_by_name_pattern(pattern)
    find_files(pattern, recursive: true)
  end
end

search = FileSearch.new("/project")
search.find_by_extension("rb")
# => ["/project/lib/app.rb", "/project/spec/app_spec.rb"]

search.find_by_name_pattern("*_controller.rb")
# => ["/project/app/controllers/users_controller.rb"]

Command-line argument parsing uses wildcards to specify multiple files:

class CommandProcessor
  def process_files(pattern)
    matching_files = Dir.glob(pattern)
    
    if matching_files.empty?
      puts "No files match pattern: #{pattern}"
      return
    end
    
    matching_files.each do |file|
      process_single_file(file)
    end
  end
  
  def process_single_file(file)
    puts "Processing: #{file}"
    # File processing logic
  end
end

processor = CommandProcessor.new
processor.process_files("data/*.csv")
# Processing: data/sales.csv
# Processing: data/inventory.csv

URL routing in web frameworks employs wildcard-like pattern matching:

class Router
  def initialize
    @routes = {}
  end
  
  def add_route(pattern, handler)
    @routes[pattern] = handler
  end
  
  def match_route(path)
    @routes.each do |pattern, handler|
      # Convert URL pattern to wildcard pattern
      wildcard_pattern = pattern.gsub(':id', '*')
      if File.fnmatch(wildcard_pattern, path)
        return handler
      end
    end
    nil
  end
end

router = Router.new
router.add_route("/users/:id", :show_user)
router.add_route("/posts/:id/comments", :show_comments)

router.match_route("/users/123")
# => :show_user

Configuration file matching enables environment-specific settings:

class ConfigLoader
  def load_configs(environment)
    pattern = "config.#{environment}.*.yml"
    config_files = Dir.glob(pattern)
    
    config_files.each_with_object({}) do |file, configs|
      key = file.match(/config\.#{environment}\.(.*?)\.yml/)[1]
      configs[key] = YAML.load_file(file)
    end
  end
end

loader = ConfigLoader.new
loader.load_configs("production")
# Loads config.production.database.yml, config.production.cache.yml, etc.

Email address validation using simplified wildcard patterns:

def validate_email_pattern(email, allowed_pattern)
  # Convert pattern like *@company.com to wildcard match
  File.fnmatch(allowed_pattern, email)
end

validate_email_pattern("john@company.com", "*@company.com")
# => true

validate_email_pattern("jane@example.com", "*@company.com")
# => false

# Allow specific departments
validate_email_pattern("support@company.com", "support@*.com")
# => true

Common Patterns

The suffix wildcard pattern "*suffix" matches strings ending with specific text. This pattern appears in file filtering, where extensions determine file types:

# Match files by extension
File.fnmatch("*.rb", "script.rb")      # => true
File.fnmatch("*.json", "config.json")  # => true

# Match log files
File.fnmatch("*.log", "application.log")  # => true

The prefix wildcard pattern "prefix*" matches strings beginning with specific text:

# Match files by prefix
File.fnmatch("test*", "test_user.rb")     # => true
File.fnmatch("config*", "config.yml")     # => true

# Match dated files
File.fnmatch("2024-01*", "2024-01-15.log")  # => true

The infix wildcard pattern "*middle*" locates strings containing specific substrings:

# Find files containing specific text
File.fnmatch("*controller*", "users_controller.rb")  # => true
File.fnmatch("*test*", "user_test.rb")                # => true

Multiple wildcard segments "prefix*middle*suffix" enable complex pattern matching:

# Match version-specific files
File.fnmatch("app-*-*.tar.gz", "app-1.2-linux.tar.gz")
# => true

# Match dated log files with levels
File.fnmatch("2024-*-*.log", "2024-01-15.log")
# => true

The question mark for exact-length matching specifies fixed positions:

# Match files with single-character variations
File.fnmatch("file?.txt", "file1.txt")  # => true
File.fnmatch("file?.txt", "file10.txt") # => false

# Match date patterns
File.fnmatch("2024-??-??.log", "2024-01-15.log")  # => true

Character class simulation combines patterns:

# Match multiple extensions
def match_any_extension(filename, extensions)
  extensions.any? { |ext| File.fnmatch("*.#{ext}", filename) }
end

match_any_extension("script.rb", ["rb", "py", "js"])
# => true

Negation patterns require explicit checking:

# Match files NOT matching pattern
def exclude_pattern(filename, exclude)
  !File.fnmatch(exclude, filename)
end

exclude_pattern("important.txt", "temp*")
# => true

exclude_pattern("temp_file.txt", "temp*")
# => false

The exact match pattern without wildcards validates specific strings:

File.fnmatch("config.yml", "config.yml")
# => true

File.fnmatch("config.yml", "config.yaml")
# => false

Directory-aware patterns with pathname flag prevent crossing directory boundaries:

# Match files in specific directory depth
File.fnmatch("docs/*.txt", "docs/file.txt", File::FNM_PATHNAME)
# => true

File.fnmatch("docs/*.txt", "docs/sub/file.txt", File::FNM_PATHNAME)
# => false (asterisk doesn't match separator)

# Use double asterisk for recursive matching
File.fnmatch("docs/**/*.txt", "docs/sub/file.txt", File::FNM_PATHNAME)
# => true

Case-insensitive patterns for cross-platform compatibility:

# Match regardless of case
File.fnmatch("*.TXT", "document.txt", File::FNM_CASEFOLD)
# => true

# Case-insensitive prefix matching
File.fnmatch("README*", "readme.md", File::FNM_CASEFOLD)
# => true

Performance Considerations

Algorithm selection significantly impacts performance. Patterns without wildcards complete in O(n) time with direct string comparison. Patterns with only question marks also achieve O(n) complexity through single-pass validation. Asterisk-containing patterns introduce complexity ranging from O(n) for optimized cases to O(mn) for dynamic programming approaches.

The recursive backtracking approach exhibits exponential worst-case complexity. A pattern like "a*a*a*a*b" against a string of 'a' characters creates a search tree where each asterisk branches into multiple possibilities. Without memoization, the algorithm explores 2^k paths for k asterisks.

# Worst case for recursive approach
pattern = "a*a*a*a*b"
string = "a" * 100 + "c"
# Explores numerous failing paths before determining no match

Dynamic programming guarantees O(mn) time complexity but requires O(mn) space for the memoization table. For patterns and strings of length 1000, this means one million table entries.

# Space-optimized DP using only two rows
def wildcard_match_optimized(pattern, string)
  m, n = pattern.length, string.length
  prev = Array.new(n + 1, false)
  curr = Array.new(n + 1, false)
  
  prev[0] = true
  
  (1..m).each do |i|
    curr[0] = prev[0] && pattern[i-1] == '*'
    
    (1..n).each do |j|
      if pattern[i-1] == '*'
        curr[j] = prev[j] || curr[j-1]
      elsif pattern[i-1] == '?' || pattern[i-1] == string[j-1]
        curr[j] = prev[j-1]
      else
        curr[j] = false
      end
    end
    
    prev, curr = curr, prev
  end
  
  prev[n]
end

Pattern structure affects performance dramatically. Patterns with asterisks at the beginning or end perform better than patterns with asterisks in the middle. The pattern "*test" or "test*" allows early termination when checking suffix or prefix, while "te*st" requires processing the entire string.

Memoization transforms exponential recursive solutions into polynomial time. Storing results of subproblems prevents redundant computation:

def wildcard_match_memo(pattern, string, memo = {})
  key = [pattern.length, string.length]
  return memo[key] if memo.key?(key)
  
  result = if pattern.empty?
    string.empty?
  elsif string.empty?
    pattern.chars.all? { |c| c == '*' }
  elsif pattern[0] == '?'
    wildcard_match_memo(pattern[1..-1], string[1..-1], memo)
  elsif pattern[0] == '*'
    wildcard_match_memo(pattern[1..-1], string, memo) ||
      wildcard_match_memo(pattern, string[1..-1], memo)
  else
    pattern[0] == string[0] &&
      wildcard_match_memo(pattern[1..-1], string[1..-1], memo)
  end
  
  memo[key] = result
end

Early termination optimization exits immediately when matches become impossible. If a pattern contains literals that exceed the remaining string length, the match must fail:

def can_match?(pattern, string)
  # Count minimum characters needed
  min_length = pattern.chars.count { |c| c != '*' }
  return false if string.length < min_length
  
  # Early termination during matching
  # Implementation continues matching
  true
end

String allocation impacts Ruby performance. Creating substrings with slicing operations allocates new string objects. Index-based approaches avoid this overhead:

def wildcard_match_indexed(pattern, string, pi = 0, si = 0)
  return si == string.length if pi == pattern.length
  return false if si == string.length && pi < pattern.length
  
  if pattern[pi] == '?'
    return false if si == string.length
    return wildcard_match_indexed(pattern, string, pi + 1, si + 1)
  elsif pattern[pi] == '*'
    return true if wildcard_match_indexed(pattern, string, pi + 1, si)
    return si < string.length && 
           wildcard_match_indexed(pattern, string, pi, si + 1)
  else
    return false if si == string.length || pattern[pi] != string[si]
    return wildcard_match_indexed(pattern, string, pi + 1, si + 1)
  end
end

Compiling patterns to regular expressions incurs upfront cost but enables efficient multi-string matching. The regex engine's optimization makes repeated matching faster than recalculating wildcard logic:

# Pattern compilation amortizes cost
compiled_pattern = wildcard_to_regex("test*.txt")

# Efficient for matching many strings
filenames.select { |f| compiled_pattern.match?(f) }

Reference

Wildcard Character Semantics

Wildcard Matches Examples
* Zero or more characters *test matches test, mytest, testing
? Exactly one character test? matches test1, testA, not test
[chars] One character from set test[123] matches test1, test2, test3
[!chars] One character not in set test[!0] matches test1, not test0
** Recursive directory match **/*.rb matches all Ruby files recursively

File.fnmatch Flags

Flag Description Effect
FNM_CASEFOLD Case insensitive *.TXT matches file.txt
FNM_PATHNAME Slash-aware matching * does not match /
FNM_DOTMATCH Match leading dots * matches .hidden
FNM_NOESCAPE Disable backslash escaping Treat backslash as literal
FNM_EXTGLOB Extended globbing Enables brace expansion

Complexity Analysis

Pattern Type Time Complexity Space Complexity Notes
No wildcards O(n) O(1) Direct comparison
Only ? wildcards O(n) O(1) Single pass validation
Single * wildcard O(n) O(1) Two-pointer approach
Multiple * wildcards O(mn) O(mn) Dynamic programming
Recursive backtracking O(2^k * n) O(k) k is asterisk count
Memoized recursion O(mn) O(mn) Cached subproblems

Common Pattern Templates

Pattern Purpose Use Case
*.ext Match by extension File filtering
prefix* Match by prefix Name-based search
*suffix Match by suffix Ending validation
keyword Substring search Content search
prefix*suffix Sandwich match Versioned files
???_data Fixed-length prefix Formatted names
dir/**/*.ext Recursive extension Deep file search
*.[ext1,ext2] Multiple extensions Multi-format match

Implementation Comparison

Approach Pros Cons Best For
File.fnmatch Native implementation, fast Limited to string/path matching File operations
Recursive Simple implementation Exponential worst case Simple patterns
Dynamic programming Polynomial guarantee Memory overhead Complex patterns
Regex conversion Regex engine power Compilation overhead Repeated matching
Two-pointer Linear time Limited pattern types Single asterisk

Method Reference

Method Parameters Returns Description
File.fnmatch pattern, path, flags Boolean Shell-style pattern matching
Dir.glob pattern, flags Array Find files matching pattern
Regexp.escape string String Escape regex metacharacters
String#match? pattern Boolean Test string against pattern

Pattern Edge Cases

Scenario Pattern String Result Reason
Empty pattern "" "" true Both empty
Empty pattern "" "text" false Pattern requires empty
Empty string "text" "" false String too short
Only asterisk "*" "anything" true Asterisk matches all
Multiple asterisks "***" "" true All match zero
No match needed "aa*" "a" true Asterisks match zero
Greedy failure "a*ab" "aab" true Requires backtracking