Overview
String matching with wildcards extends basic string comparison by introducing metacharacters that match variable portions of text. The two primary wildcard characters are the asterisk (*) which matches zero or more characters, and the question mark (?) which matches exactly one character. This pattern matching mechanism appears throughout computing systems, from file globbing in shells to search functionality in applications.
The wildcard matching problem requires determining whether a pattern containing wildcards matches a given string. Unlike regular expressions, wildcard patterns use simpler syntax with fewer metacharacters, making them more accessible for end users while remaining computationally interesting for developers.
# Basic wildcard matching concept
pattern = "*.txt"
filename = "document.txt"
# Pattern matches: * represents "document"
pattern = "file?.log"
filename = "file1.log"
# Pattern matches: ? represents "1"
pattern = "test*"
filename = "testing"
# Pattern matches: * represents "ing"
The matching problem becomes non-trivial when patterns contain multiple wildcards or when asterisks need to match variable amounts of text. A pattern like "a*b*c" matching against "aXbYc" requires the algorithm to correctly partition the string to satisfy all pattern components.
Wildcard matching differs from substring search because the entire string must satisfy the pattern structure. The pattern "*test*" matches any string containing "test", but "test*" only matches strings beginning with "test".
Key Principles
Wildcard matching operates on three fundamental components: literal character matching, single-character wildcards, and multi-character wildcards. Literal characters in the pattern must match exactly at their corresponding positions. The question mark wildcard matches any single character but cannot match zero characters or multiple characters. The asterisk wildcard matches any sequence of zero or more characters, providing flexibility in pattern specification.
The matching process evaluates whether a pattern can map to a string such that all literal characters align correctly and wildcards consume appropriate portions of the string. When no wildcards exist, the problem reduces to string equality. With only question marks, the pattern and string must have equal length with matching characters at non-wildcard positions. Asterisks introduce complexity because they can match variable amounts of text, creating multiple potential matches that the algorithm must explore.
The core challenge involves handling asterisk ambiguity. Consider matching "a*b*c" against "aXbYbZc". The first asterisk could match "X" or "XbY", and the second asterisk must adjust accordingly. The algorithm must explore different partitioning strategies to determine if any valid match exists.
Greedy matching, where asterisks consume as much text as possible, does not guarantee correct results. The pattern "*ab*" against "aabab" demonstrates this: a greedy first asterisk matching "aab" would leave "ab" for the second asterisk, but the pattern requires literal "ab" between the wildcards. A correct match requires the first asterisk to match "a" and the second to match "".
Character position plays a critical role in wildcard semantics. Unlike regular expressions with quantifiers, wildcards in string matching typically operate character-by-character. The pattern "a*" matching "abc" means the asterisk consumes "bc", but the pattern tracks this as consuming two characters, not as a grouped match.
Pattern structure determines algorithmic complexity. Patterns without wildcards complete in linear time with direct comparison. Patterns with only question marks require single-pass validation. Patterns with asterisks potentially require exploring multiple match possibilities, leading to exponential worst-case complexity without optimization.
The empty string presents edge cases that algorithms must handle explicitly. An empty pattern matches only an empty string. A pattern consisting solely of asterisks matches any string, including empty strings. A pattern with literals or question marks cannot match an empty string.
Anchoring behavior distinguishes wildcard matching from substring search. The pattern "test" as a wildcard pattern matches only the exact string "test", whereas as a substring search it matches any string containing "test". Wildcard patterns implicitly anchor to both string start and end unless asterisks provide flexibility.
Ruby Implementation
Ruby provides multiple approaches for wildcard matching depending on the use case. The File.fnmatch method implements shell-style wildcard matching, supporting asterisk and question mark wildcards along with additional shell patterns. The Dir.glob method uses wildcard patterns for file system traversal. Converting wildcards to regular expressions offers another implementation strategy with full regex engine power.
The File.fnmatch method accepts a pattern and string, returning a boolean indicating whether the string matches the pattern:
# Basic File.fnmatch usage
File.fnmatch("*.txt", "document.txt")
# => true
File.fnmatch("file?.log", "file1.log")
# => true
File.fnmatch("test*", "testing")
# => true
File.fnmatch("test*", "production")
# => false
File.fnmatch supports flags that modify matching behavior. The File::FNM_CASEFOLD flag enables case-insensitive matching, while File::FNM_PATHNAME prevents asterisks from matching directory separators:
# Case-insensitive matching
File.fnmatch("*.TXT", "document.txt", File::FNM_CASEFOLD)
# => true
# Pathname-aware matching
File.fnmatch("*/*.txt", "docs/file.txt", File::FNM_PATHNAME)
# => true
File.fnmatch("*.txt", "docs/file.txt", File::FNM_PATHNAME)
# => false (asterisk doesn't cross directory separator)
The File::FNM_DOTMATCH flag allows wildcards to match leading dots in filenames, which are typically hidden files in Unix systems:
# Default behavior excludes dotfiles
File.fnmatch("*", ".hidden")
# => false
# FNM_DOTMATCH includes dotfiles
File.fnmatch("*", ".hidden", File::FNM_DOTMATCH)
# => true
Custom wildcard matching implementation in Ruby demonstrates the algorithmic approach. A recursive solution explores different match possibilities when encountering wildcards:
def wildcard_match(pattern, string)
return string.empty? if pattern.empty?
return false if string.empty? && pattern != "*" * pattern.length
if pattern[0] == '?'
return false if string.empty?
return wildcard_match(pattern[1..-1], string[1..-1])
elsif pattern[0] == '*'
# Try matching asterisk to zero characters
return true if wildcard_match(pattern[1..-1], string)
# Try matching asterisk to one or more characters
return !string.empty? && wildcard_match(pattern, string[1..-1])
else
# Literal character must match
return false if string.empty? || pattern[0] != string[0]
return wildcard_match(pattern[1..-1], string[1..-1])
end
end
wildcard_match("a*b*c", "aXbYc")
# => true
wildcard_match("*.rb", "script.rb")
# => true
Dynamic programming improves performance for patterns with multiple asterisks by caching subproblem results:
def wildcard_match_dp(pattern, string)
m, n = pattern.length, string.length
dp = Array.new(m + 1) { Array.new(n + 1, false) }
# Empty pattern matches empty string
dp[0][0] = true
# Pattern with only asterisks can match empty string
(1..m).each do |i|
dp[i][0] = dp[i-1][0] && pattern[i-1] == '*'
end
(1..m).each do |i|
(1..n).each do |j|
if pattern[i-1] == '*'
# Match asterisk to zero characters or one+ characters
dp[i][j] = dp[i-1][j] || dp[i][j-1]
elsif pattern[i-1] == '?' || pattern[i-1] == string[j-1]
# Match single character wildcard or literal
dp[i][j] = dp[i-1][j-1]
end
end
end
dp[m][n]
end
wildcard_match_dp("a*b*c", "aXbYbZc")
# => true
Converting wildcard patterns to regular expressions enables using Ruby's regex engine:
def wildcard_to_regex(pattern)
regex_pattern = Regexp.escape(pattern)
regex_pattern.gsub!('\*', '.*')
regex_pattern.gsub!('\?', '.')
Regexp.new("\\A#{regex_pattern}\\z")
end
regex = wildcard_to_regex("test*.txt")
regex.match?("test_file.txt")
# => true
regex.match?("testing.txt")
# => true
The Dir.glob method uses wildcard patterns for file system operations, supporting both asterisk and double-asterisk for recursive directory matching:
# Match all Ruby files in current directory
Dir.glob("*.rb")
# Match Ruby files in any subdirectory
Dir.glob("**/*.rb")
# Match files with specific character in name
Dir.glob("file?.txt")
# Combine patterns with braces
Dir.glob("*.{rb,py}")
Implementation Approaches
Several algorithmic strategies solve the wildcard matching problem, each with distinct performance characteristics and implementation complexity. The choice of approach depends on pattern complexity, string length, and performance requirements.
Recursive backtracking explores match possibilities by trying different ways to consume wildcard characters. When encountering an asterisk, the algorithm attempts matching it to zero characters first, then recursively tries matching one character at a time. This approach naturally handles multiple wildcards and complex patterns but may revisit the same subproblems multiple times.
The recursive approach maintains simplicity but suffers from exponential time complexity in worst cases. A pattern like "a*a*a*a*b" against a string of many 'a' characters followed by 'c' will explore numerous failing match attempts before determining no match exists.
Dynamic programming eliminates redundant computation by building a table of subproblem results. The table entry dp[i][j] indicates whether pattern[0..i-1] matches string[0..j-1]. The algorithm fills the table iteratively, with each entry depending only on previously computed values.
Dynamic programming guarantees polynomial time complexity, specifically O(mn) where m is pattern length and n is string length. The space complexity is also O(mn) for the table, though optimization can reduce space to O(n) by maintaining only two rows.
Greedy matching with backtracking offers a middle ground. The algorithm processes the pattern greedily, with asterisks initially matching as much as possible. When a mismatch occurs, the algorithm backtracks to the most recent asterisk and tries a shorter match. This approach performs well on typical inputs while handling worst cases better than pure greedy matching.
Two-pointer techniques provide efficient solutions for restricted pattern types. When patterns contain a single asterisk, two pointers can scan from both string ends, matching prefix and suffix portions separately. This linear-time approach handles common cases like "prefix*suffix" efficiently.
Finite automata compilation converts patterns into state machines that process strings in linear time after construction. The pattern becomes a deterministic or non-deterministic finite automaton where states represent pattern positions and transitions represent character matches or wildcard consumption. This approach excels when matching many strings against the same pattern.
Hybrid approaches combine techniques based on pattern structure analysis. Patterns without wildcards use direct comparison. Patterns with only question marks use single-pass validation. Complex patterns with multiple asterisks employ dynamic programming or backtracking as needed.
Practical Examples
File filtering represents a primary application of wildcard matching. Applications use wildcard patterns to select files based on naming conventions:
def filter_files(directory, pattern)
Dir.entries(directory).select do |filename|
File.fnmatch(pattern, filename)
end
end
# Find all text files
filter_files("/docs", "*.txt")
# => ["report.txt", "notes.txt"]
# Find log files from specific date
filter_files("/logs", "2024-01-*.log")
# => ["2024-01-15.log", "2024-01-16.log"]
# Find temporary files
filter_files("/tmp", "temp_?.dat")
# => ["temp_1.dat", "temp_2.dat"]
Search functionality in text editors and IDEs employs wildcards for flexible file discovery:
class FileSearch
def initialize(root_path)
@root_path = root_path
end
def find_files(pattern, recursive: false)
glob_pattern = recursive ? "**/#{pattern}" : pattern
Dir.glob(File.join(@root_path, glob_pattern))
end
def find_by_extension(extension)
find_files("*.#{extension}", recursive: true)
end
def find_by_name_pattern(pattern)
find_files(pattern, recursive: true)
end
end
search = FileSearch.new("/project")
search.find_by_extension("rb")
# => ["/project/lib/app.rb", "/project/spec/app_spec.rb"]
search.find_by_name_pattern("*_controller.rb")
# => ["/project/app/controllers/users_controller.rb"]
Command-line argument parsing uses wildcards to specify multiple files:
class CommandProcessor
def process_files(pattern)
matching_files = Dir.glob(pattern)
if matching_files.empty?
puts "No files match pattern: #{pattern}"
return
end
matching_files.each do |file|
process_single_file(file)
end
end
def process_single_file(file)
puts "Processing: #{file}"
# File processing logic
end
end
processor = CommandProcessor.new
processor.process_files("data/*.csv")
# Processing: data/sales.csv
# Processing: data/inventory.csv
URL routing in web frameworks employs wildcard-like pattern matching:
class Router
def initialize
@routes = {}
end
def add_route(pattern, handler)
@routes[pattern] = handler
end
def match_route(path)
@routes.each do |pattern, handler|
# Convert URL pattern to wildcard pattern
wildcard_pattern = pattern.gsub(':id', '*')
if File.fnmatch(wildcard_pattern, path)
return handler
end
end
nil
end
end
router = Router.new
router.add_route("/users/:id", :show_user)
router.add_route("/posts/:id/comments", :show_comments)
router.match_route("/users/123")
# => :show_user
Configuration file matching enables environment-specific settings:
class ConfigLoader
def load_configs(environment)
pattern = "config.#{environment}.*.yml"
config_files = Dir.glob(pattern)
config_files.each_with_object({}) do |file, configs|
key = file.match(/config\.#{environment}\.(.*?)\.yml/)[1]
configs[key] = YAML.load_file(file)
end
end
end
loader = ConfigLoader.new
loader.load_configs("production")
# Loads config.production.database.yml, config.production.cache.yml, etc.
Email address validation using simplified wildcard patterns:
def validate_email_pattern(email, allowed_pattern)
# Convert pattern like *@company.com to wildcard match
File.fnmatch(allowed_pattern, email)
end
validate_email_pattern("john@company.com", "*@company.com")
# => true
validate_email_pattern("jane@example.com", "*@company.com")
# => false
# Allow specific departments
validate_email_pattern("support@company.com", "support@*.com")
# => true
Common Patterns
The suffix wildcard pattern "*suffix" matches strings ending with specific text. This pattern appears in file filtering, where extensions determine file types:
# Match files by extension
File.fnmatch("*.rb", "script.rb") # => true
File.fnmatch("*.json", "config.json") # => true
# Match log files
File.fnmatch("*.log", "application.log") # => true
The prefix wildcard pattern "prefix*" matches strings beginning with specific text:
# Match files by prefix
File.fnmatch("test*", "test_user.rb") # => true
File.fnmatch("config*", "config.yml") # => true
# Match dated files
File.fnmatch("2024-01*", "2024-01-15.log") # => true
The infix wildcard pattern "*middle*" locates strings containing specific substrings:
# Find files containing specific text
File.fnmatch("*controller*", "users_controller.rb") # => true
File.fnmatch("*test*", "user_test.rb") # => true
Multiple wildcard segments "prefix*middle*suffix" enable complex pattern matching:
# Match version-specific files
File.fnmatch("app-*-*.tar.gz", "app-1.2-linux.tar.gz")
# => true
# Match dated log files with levels
File.fnmatch("2024-*-*.log", "2024-01-15.log")
# => true
The question mark for exact-length matching specifies fixed positions:
# Match files with single-character variations
File.fnmatch("file?.txt", "file1.txt") # => true
File.fnmatch("file?.txt", "file10.txt") # => false
# Match date patterns
File.fnmatch("2024-??-??.log", "2024-01-15.log") # => true
Character class simulation combines patterns:
# Match multiple extensions
def match_any_extension(filename, extensions)
extensions.any? { |ext| File.fnmatch("*.#{ext}", filename) }
end
match_any_extension("script.rb", ["rb", "py", "js"])
# => true
Negation patterns require explicit checking:
# Match files NOT matching pattern
def exclude_pattern(filename, exclude)
!File.fnmatch(exclude, filename)
end
exclude_pattern("important.txt", "temp*")
# => true
exclude_pattern("temp_file.txt", "temp*")
# => false
The exact match pattern without wildcards validates specific strings:
File.fnmatch("config.yml", "config.yml")
# => true
File.fnmatch("config.yml", "config.yaml")
# => false
Directory-aware patterns with pathname flag prevent crossing directory boundaries:
# Match files in specific directory depth
File.fnmatch("docs/*.txt", "docs/file.txt", File::FNM_PATHNAME)
# => true
File.fnmatch("docs/*.txt", "docs/sub/file.txt", File::FNM_PATHNAME)
# => false (asterisk doesn't match separator)
# Use double asterisk for recursive matching
File.fnmatch("docs/**/*.txt", "docs/sub/file.txt", File::FNM_PATHNAME)
# => true
Case-insensitive patterns for cross-platform compatibility:
# Match regardless of case
File.fnmatch("*.TXT", "document.txt", File::FNM_CASEFOLD)
# => true
# Case-insensitive prefix matching
File.fnmatch("README*", "readme.md", File::FNM_CASEFOLD)
# => true
Performance Considerations
Algorithm selection significantly impacts performance. Patterns without wildcards complete in O(n) time with direct string comparison. Patterns with only question marks also achieve O(n) complexity through single-pass validation. Asterisk-containing patterns introduce complexity ranging from O(n) for optimized cases to O(mn) for dynamic programming approaches.
The recursive backtracking approach exhibits exponential worst-case complexity. A pattern like "a*a*a*a*b" against a string of 'a' characters creates a search tree where each asterisk branches into multiple possibilities. Without memoization, the algorithm explores 2^k paths for k asterisks.
# Worst case for recursive approach
pattern = "a*a*a*a*b"
string = "a" * 100 + "c"
# Explores numerous failing paths before determining no match
Dynamic programming guarantees O(mn) time complexity but requires O(mn) space for the memoization table. For patterns and strings of length 1000, this means one million table entries.
# Space-optimized DP using only two rows
def wildcard_match_optimized(pattern, string)
m, n = pattern.length, string.length
prev = Array.new(n + 1, false)
curr = Array.new(n + 1, false)
prev[0] = true
(1..m).each do |i|
curr[0] = prev[0] && pattern[i-1] == '*'
(1..n).each do |j|
if pattern[i-1] == '*'
curr[j] = prev[j] || curr[j-1]
elsif pattern[i-1] == '?' || pattern[i-1] == string[j-1]
curr[j] = prev[j-1]
else
curr[j] = false
end
end
prev, curr = curr, prev
end
prev[n]
end
Pattern structure affects performance dramatically. Patterns with asterisks at the beginning or end perform better than patterns with asterisks in the middle. The pattern "*test" or "test*" allows early termination when checking suffix or prefix, while "te*st" requires processing the entire string.
Memoization transforms exponential recursive solutions into polynomial time. Storing results of subproblems prevents redundant computation:
def wildcard_match_memo(pattern, string, memo = {})
key = [pattern.length, string.length]
return memo[key] if memo.key?(key)
result = if pattern.empty?
string.empty?
elsif string.empty?
pattern.chars.all? { |c| c == '*' }
elsif pattern[0] == '?'
wildcard_match_memo(pattern[1..-1], string[1..-1], memo)
elsif pattern[0] == '*'
wildcard_match_memo(pattern[1..-1], string, memo) ||
wildcard_match_memo(pattern, string[1..-1], memo)
else
pattern[0] == string[0] &&
wildcard_match_memo(pattern[1..-1], string[1..-1], memo)
end
memo[key] = result
end
Early termination optimization exits immediately when matches become impossible. If a pattern contains literals that exceed the remaining string length, the match must fail:
def can_match?(pattern, string)
# Count minimum characters needed
min_length = pattern.chars.count { |c| c != '*' }
return false if string.length < min_length
# Early termination during matching
# Implementation continues matching
true
end
String allocation impacts Ruby performance. Creating substrings with slicing operations allocates new string objects. Index-based approaches avoid this overhead:
def wildcard_match_indexed(pattern, string, pi = 0, si = 0)
return si == string.length if pi == pattern.length
return false if si == string.length && pi < pattern.length
if pattern[pi] == '?'
return false if si == string.length
return wildcard_match_indexed(pattern, string, pi + 1, si + 1)
elsif pattern[pi] == '*'
return true if wildcard_match_indexed(pattern, string, pi + 1, si)
return si < string.length &&
wildcard_match_indexed(pattern, string, pi, si + 1)
else
return false if si == string.length || pattern[pi] != string[si]
return wildcard_match_indexed(pattern, string, pi + 1, si + 1)
end
end
Compiling patterns to regular expressions incurs upfront cost but enables efficient multi-string matching. The regex engine's optimization makes repeated matching faster than recalculating wildcard logic:
# Pattern compilation amortizes cost
compiled_pattern = wildcard_to_regex("test*.txt")
# Efficient for matching many strings
filenames.select { |f| compiled_pattern.match?(f) }
Reference
Wildcard Character Semantics
| Wildcard | Matches | Examples |
|---|---|---|
| * | Zero or more characters | *test matches test, mytest, testing |
| ? | Exactly one character | test? matches test1, testA, not test |
| [chars] | One character from set | test[123] matches test1, test2, test3 |
| [!chars] | One character not in set | test[!0] matches test1, not test0 |
| ** | Recursive directory match | **/*.rb matches all Ruby files recursively |
File.fnmatch Flags
| Flag | Description | Effect |
|---|---|---|
| FNM_CASEFOLD | Case insensitive | *.TXT matches file.txt |
| FNM_PATHNAME | Slash-aware matching | * does not match / |
| FNM_DOTMATCH | Match leading dots | * matches .hidden |
| FNM_NOESCAPE | Disable backslash escaping | Treat backslash as literal |
| FNM_EXTGLOB | Extended globbing | Enables brace expansion |
Complexity Analysis
| Pattern Type | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| No wildcards | O(n) | O(1) | Direct comparison |
| Only ? wildcards | O(n) | O(1) | Single pass validation |
| Single * wildcard | O(n) | O(1) | Two-pointer approach |
| Multiple * wildcards | O(mn) | O(mn) | Dynamic programming |
| Recursive backtracking | O(2^k * n) | O(k) | k is asterisk count |
| Memoized recursion | O(mn) | O(mn) | Cached subproblems |
Common Pattern Templates
| Pattern | Purpose | Use Case |
|---|---|---|
| *.ext | Match by extension | File filtering |
| prefix* | Match by prefix | Name-based search |
| *suffix | Match by suffix | Ending validation |
| keyword | Substring search | Content search |
| prefix*suffix | Sandwich match | Versioned files |
| ???_data | Fixed-length prefix | Formatted names |
| dir/**/*.ext | Recursive extension | Deep file search |
| *.[ext1,ext2] | Multiple extensions | Multi-format match |
Implementation Comparison
| Approach | Pros | Cons | Best For |
|---|---|---|---|
| File.fnmatch | Native implementation, fast | Limited to string/path matching | File operations |
| Recursive | Simple implementation | Exponential worst case | Simple patterns |
| Dynamic programming | Polynomial guarantee | Memory overhead | Complex patterns |
| Regex conversion | Regex engine power | Compilation overhead | Repeated matching |
| Two-pointer | Linear time | Limited pattern types | Single asterisk |
Method Reference
| Method | Parameters | Returns | Description |
|---|---|---|---|
| File.fnmatch | pattern, path, flags | Boolean | Shell-style pattern matching |
| Dir.glob | pattern, flags | Array | Find files matching pattern |
| Regexp.escape | string | String | Escape regex metacharacters |
| String#match? | pattern | Boolean | Test string against pattern |
Pattern Edge Cases
| Scenario | Pattern | String | Result | Reason |
|---|---|---|---|---|
| Empty pattern | "" | "" | true | Both empty |
| Empty pattern | "" | "text" | false | Pattern requires empty |
| Empty string | "text" | "" | false | String too short |
| Only asterisk | "*" | "anything" | true | Asterisk matches all |
| Multiple asterisks | "***" | "" | true | All match zero |
| No match needed | "aa*" | "a" | true | Asterisks match zero |
| Greedy failure | "a*ab" | "aab" | true | Requires backtracking |