Overview
Named captures provide a way to assign meaningful names to capture groups in regular expressions, making pattern matching more readable and maintainable. Ruby implements named captures through the (?<name>pattern)
syntax within regular expressions, allowing developers to access matched content by name rather than numeric index.
The MatchData
object returned by matching operations contains named capture information accessible through the []
operator with string or symbol keys. Named captures work with all Ruby regex matching methods including String#match
, String#scan
, Regexp#match
, and String#=~
.
text = "Contact: john.doe@example.com"
pattern = /Contact: (?<username>\w+)\.(?<domain>\w+)@(?<host>[\w.]+)/
match = text.match(pattern)
match[:username] # => "john"
match[:domain] # => "doe"
match[:host] # => "example.com"
Named captures integrate with Ruby's global variables $1
, $2
, etc., maintaining backward compatibility while providing named access. The MatchData
object preserves both numeric and named access methods simultaneously.
pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
"2023-12-15".match(pattern)
$1 # => "2023" (first capture)
$~[:year] # => "2023" (named access)
$~[:month] # => "12"
$~[:day] # => "15"
Ruby supports mixed named and unnamed captures within the same pattern, though this approach reduces code clarity. Named captures become particularly valuable when working with complex patterns containing multiple groups or when patterns are reused across different contexts.
Basic Usage
The fundamental syntax for named captures uses (?<name>pattern)
where name
becomes the identifier for accessing the matched content. The name must be a valid Ruby identifier starting with a letter or underscore.
email_pattern = /(?<local>[^@]+)@(?<domain>[^.]+)\.(?<tld>\w+)/
email = "admin@company.org"
match = email.match(email_pattern)
match[:local] # => "admin"
match[:domain] # => "company"
match[:tld] # => "org"
String matching methods return MatchData
objects when matches succeed, nil
when they fail. The MatchData
object provides access to named captures through bracket notation accepting strings, symbols, or numeric indices.
log_pattern = /\[(?<timestamp>[^\]]+)\] (?<level>\w+): (?<message>.*)/
log_line = "[2023-12-15 14:30:15] ERROR: Database connection failed"
match = log_line.match(log_pattern)
if match
puts "Time: #{match[:timestamp]}"
puts "Level: #{match[:level]}"
puts "Message: #{match[:message]}"
end
The String#scan
method works with named captures but returns arrays of captured values rather than MatchData
objects. When using named captures with scan
, Ruby returns arrays containing the captured values in the order they appear in the pattern.
text = "User: alice, Age: 25; User: bob, Age: 30"
pattern = /User: (?<name>\w+), Age: (?<age>\d+)/
matches = text.scan(pattern)
# => [["alice", "25"], ["bob", "30"]]
Named captures support quantifiers and optional groups like standard captures. Optional named groups return nil
when they don't match, while repeated named groups capture only the last match.
url_pattern = /https?:\/\/(?<subdomain>\w+\.)?(?<domain>\w+)\.(?<tld>\w+)/
urls = [
"https://www.example.com",
"http://example.com",
"https://api.service.org"
]
urls.each do |url|
match = url.match(url_pattern)
puts "Domain: #{match[:domain]}, Subdomain: #{match[:subdomain]}"
end
# Domain: example, Subdomain: www.
# Domain: example, Subdomain:
# Domain: service, Subdomain: api.
Advanced Usage
Named captures support complex nesting and alternation patterns. When multiple named groups share the same name within alternation branches, Ruby captures whichever branch matches successfully.
phone_pattern = /
(?:
(?<country>\+\d{1,3})\s*
(?<area>\(\d{3}\)|\d{3})[-.\s]*
(?<exchange>\d{3})[-.\s]*
(?<number>\d{4})
|
(?<country>\+\d{1,3})[-.\s]*
(?<area>\d{2,4})[-.\s]*
(?<exchange>\d{3,4})[-.\s]*
(?<number>\d{4})
)
/x
phones = [
"+1 (555) 123-4567",
"+44 20 7946 0958"
]
phones.each do |phone|
match = phone.match(phone_pattern)
if match
puts "Country: #{match[:country]}"
puts "Area: #{match[:area]}"
puts "Exchange: #{match[:exchange]}"
puts "Number: #{match[:number]}"
puts "---"
end
end
Named captures integrate with lookahead and lookbehind assertions, enabling complex validation patterns while maintaining readable capture names.
password_pattern = /
(?=.*(?<has_lower>[a-z]))
(?=.*(?<has_upper>[A-Z]))
(?=.*(?<has_digit>\d))
(?=.*(?<has_special>[!@#$%^&*]))
(?<password>.{8,})
/x
def validate_password(password)
match = password.match(password_pattern)
return false unless match
validations = {
lowercase: !match[:has_lower].nil?,
uppercase: !match[:has_upper].nil?,
digit: !match[:has_digit].nil?,
special: !match[:has_special].nil?,
length: match[:password].length >= 8
}
validations.all? { |_, valid| valid }
end
The MatchData#named_captures
method returns a hash of all named captures, useful for dynamic processing or debugging complex patterns.
csv_pattern = /
(?:"(?<quoted_field>[^"]*)"|(?<unquoted_field>[^,\n]*))
(?:,(?:"(?<next_quoted>[^"]*)"|(?<next_unquoted>[^,\n]*)))*
/x
csv_line = 'John Doe,"123 Main St, Apt 4",555-1234'
match = csv_line.match(csv_pattern)
match.named_captures.each do |name, value|
puts "#{name}: #{value.inspect}" if value
end
Named captures can reference other named captures within the same pattern using \k<name>
syntax, enabling complex matching scenarios like balanced delimiters or repeated patterns.
balanced_quotes = /
(?<quote>['"])
(?<content>(?:[^\\]|\\.)*?)
\k<quote>
/x
text = 'He said "Hello there" and she replied \'Good morning\''
matches = text.scan(balanced_quotes)
matches.each do |quote_char, content|
puts "Found quoted text: #{content} (using #{quote_char})"
end
Production Patterns
Named captures excel in web application contexts where pattern matching drives routing, validation, and data extraction. Rails applications commonly use named captures for custom route constraints and parameter parsing.
class CustomRouteConstraint
def initialize(pattern)
@pattern = pattern
end
def matches?(request)
match = request.path.match(@pattern)
return false unless match
request.env[:route_captures] = match.named_captures
true
end
end
# Route constraint for API versioning
api_constraint = CustomRouteConstraint.new(
/\/api\/v(?<version>\d+)\/(?<resource>\w+)\/(?<id>\d+)/
)
# Usage in routes.rb would access captures via request.env[:route_captures]
Log parsing represents another common production use case where named captures improve maintainability and debugging capability compared to positional captures.
class LogParser
ACCESS_LOG_PATTERN = /
(?<ip>\d+\.\d+\.\d+\.\d+)\s+
-\s+
-\s+
\[(?<timestamp>[^\]]+)\]\s+
"(?<method>\w+)\s+(?<path>[^\s]+)\s+HTTP\/(?<http_version>[\d.]+)"\s+
(?<status>\d+)\s+
(?<size>\d+|-)\s+
"(?<referer>[^"]*)"\s+
"(?<user_agent>[^"]*)"
/x
def self.parse_line(line)
match = line.match(ACCESS_LOG_PATTERN)
return nil unless match
{
ip_address: match[:ip],
timestamp: Time.strptime(match[:timestamp], "%d/%b/%Y:%H:%M:%S %z"),
http_method: match[:method],
request_path: match[:path],
http_version: match[:http_version],
status_code: match[:status].to_i,
response_size: match[:size] == "-" ? 0 : match[:size].to_i,
referer: match[:referer],
user_agent: match[:user_agent]
}
rescue ArgumentError
nil
end
end
# Process log file
File.foreach("access.log") do |line|
parsed = LogParser.parse_line(line.chomp)
next unless parsed
# Process parsed log entry
if parsed[:status_code] >= 400
puts "Error: #{parsed[:status_code]} for #{parsed[:request_path]}"
end
end
Data validation and sanitization workflows benefit from named captures when extracting and validating structured input. Named captures make validation logic more maintainable and error messages more descriptive.
class DataValidator
PATTERNS = {
email: /\A(?<local>[a-zA-Z0-9._-]+)@(?<domain>[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})\z/,
phone: /\A(?:\+(?<country>\d{1,3})[-.\s]?)?(?<area>\d{3})[-.\s]?(?<exchange>\d{3})[-.\s]?(?<number>\d{4})\z/,
credit_card: /\A(?<type>4\d{3}|5[1-5]\d{2}|3[47]\d{2})-?(?<group1>\d{4})-?(?<group2>\d{4})-?(?<group3>\d{4})\z/
}.freeze
def self.validate_and_extract(type, value)
pattern = PATTERNS[type]
return { valid: false, error: "Unknown validation type" } unless pattern
match = value.match(pattern)
return { valid: false, error: "Invalid format" } unless match
result = { valid: true, captures: match.named_captures }
case type
when :email
result[:normalized] = "#{match[:local]}@#{match[:domain].downcase}"
when :phone
digits = [match[:area], match[:exchange], match[:number]].join
result[:normalized] = match[:country] ? "+#{match[:country]}#{digits}" : digits
when :credit_card
result[:masked] = "**** **** **** #{match[:group3]}"
result[:type] = case match[:type][0]
when '4' then 'Visa'
when '5' then 'MasterCard'
when '3' then 'American Express'
end
end
result
end
end
# Usage in form processing
validation_result = DataValidator.validate_and_extract(:email, "user@EXAMPLE.COM")
if validation_result[:valid]
email = validation_result[:normalized] # => "user@example.com"
local_part = validation_result[:captures]["local"] # => "user"
end
Common Pitfalls
Named capture behavior differs subtly from numbered captures when dealing with optional groups and alternations. Optional named captures return nil
rather than empty strings, which can cause unexpected behavior in string concatenation.
optional_pattern = /(?<prefix>\w+:)?(?<value>\w+)/
"test".match(optional_pattern)
# match[:prefix] is nil, not ""
# match[:value] is "test"
# This will raise TypeError
begin
result = match[:prefix] + match[:value]
rescue TypeError => e
puts "Error: #{e.message}"
end
# Correct approach
result = "#{match[:prefix]}#{match[:value]}" # => "test"
Repeated named captures only preserve the last match, not all matches. This behavior catches developers who expect array-like behavior from repeated named groups.
repeated_pattern = /(?<word>\w+),?\s*/
text = "apple, banana, cherry"
# This only captures "cherry", not all words
match = text.match(/(?<word>\w+)(?:,\s*(?<word>\w+))*/
match[:word] # => "cherry", not ["apple", "banana", "cherry"]
# Correct approach for multiple captures
words = text.scan(/(?<word>\w+)/).flatten
# => ["apple", "banana", "cherry"]
Named captures in alternation branches can produce confusing results when multiple branches contain the same capture name but different semantic meanings.
# Problematic pattern with same name in different contexts
ambiguous_pattern = /
(?:user:(?<identifier>\w+))|
(?:email:(?<identifier>[^@]+@[^.]+\.\w+))
/x
"user:john123".match(ambiguous_pattern)[:identifier] # => "john123"
"email:john@example.com".match(ambiguous_pattern)[:identifier] # => "john@example.com"
# The same :identifier name has different meanings
# Better approach uses distinct names
clear_pattern = /
(?:user:(?<username>\w+))|
(?:email:(?<email_address>[^@]+@[^.]+\.\w+))
/x
Performance implications arise when using many named captures or complex patterns. Named captures add overhead compared to numbered captures, particularly when MatchData#named_captures
gets called frequently.
# Performance-sensitive code should avoid
heavy_pattern = /
(?<field1>\w+)\s+
(?<field2>\w+)\s+
(?<field3>\w+)\s+
(?<field4>\w+)\s+
(?<field5>\w+)\s+
(?<field6>\w+)\s+
(?<field7>\w+)
/x
large_file_lines.each do |line|
match = line.match(heavy_pattern)
next unless match
# This creates hash repeatedly - expensive
data = match.named_captures
process_data(data)
end
# More efficient approach
large_file_lines.each do |line|
match = line.match(heavy_pattern)
next unless match
# Direct access avoids hash creation
process_fields(match[:field1], match[:field2], match[:field3])
end
Global variable behavior with named captures can surprise developers. The special variables $1
, $2
, etc., still work with named captures but follow numeric order, not alphabetical order of names.
pattern = /(?<zebra>\w+)\s+(?<alpha>\w+)/
"first second".match(pattern)
$1 # => "first" (zebra capture, not alpha)
$2 # => "second" (alpha capture)
# Named access doesn't affect global variable order
$~[:zebra] # => "first"
$~[:alpha] # => "second"
Unicode identifier names in captures work but can cause portability issues across different Ruby versions and systems. ASCII names provide better compatibility.
# Works but potentially problematic
unicode_pattern = /(?<数字>\d+)(?<文字>\w+)/
text = "123abc"
match = text.match(unicode_pattern)
match[:数字] # => "123"
# Safer approach
ascii_pattern = /(?<digits>\d+)(?<letters>\w+)/
match = text.match(ascii_pattern)
match[:digits] # => "123"
Reference
MatchData Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#[] |
name (String/Symbol/Integer) |
String or nil |
Access named or numbered capture |
#named_captures |
None | Hash |
Hash of all named captures |
#names |
None | Array<String> |
Array of all capture names |
#values_at |
*names (String/Symbol/Integer) |
Array |
Multiple captures by name or index |
String Methods with Named Captures
Method | Parameters | Returns | Description |
---|---|---|---|
#match |
pattern , pos=0 |
MatchData or nil |
Match pattern, return MatchData |
#match? |
pattern , pos=0 |
Boolean |
Test if pattern matches |
#scan |
pattern |
Array |
All matches as arrays of captures |
#gsub |
pattern , replacement |
String |
Replace with access to named captures |
Regexp Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#match |
string , pos=0 |
MatchData or nil |
Match against string |
#match? |
string , pos=0 |
Boolean |
Test match against string |
#named_captures |
None | Hash |
Template of named captures |
#names |
None | Array<String> |
Array of capture names in pattern |
Named Capture Syntax
Syntax | Description | Example |
---|---|---|
(?<name>pattern) |
Basic named capture | (?<word>\w+) |
(?'name'pattern) |
Alternative syntax | (?'word'\w+) |
\k<name> |
Reference named capture | (?<quote>['"])\w+\k<quote> |
\k'name' |
Alternative reference | (?'quote'['"])\w+\k'quote' |
Global Variables
Variable | Description | Type |
---|---|---|
$~ |
Last MatchData object | MatchData or nil |
$1 , $2 , ... |
Numbered captures (work with named) | String or nil |
$+ |
Last capture group | String or nil |
$& |
Entire match | String or nil |
Replacement Patterns in gsub
Pattern | Description | Example |
---|---|---|
\k<name> |
Named capture reference | "text".gsub(/(?<word>\w+)/, '[\k<word>]') |
\k'name' |
Alternative reference syntax | "text".gsub(/(?'word'\w+)/, '[\k'word']') |
Block form | Block receives MatchData | "text".gsub(/(?<word>\w+)/) { |m| m[:word].upcase } |
MatchData Hash Access
Access Method | Result Type | Description |
---|---|---|
match[:name] |
String or nil |
Symbol key access |
match["name"] |
String or nil |
String key access |
match[0] |
String |
Entire match (index 0) |
match[1] |
String or nil |
First capture group |
Error Conditions
Condition | Exception | Description |
---|---|---|
Invalid capture name | RegexpError |
Name contains invalid characters |
Duplicate capture name | None | Later captures override earlier ones |
Missing closing > |
RegexpError |
Malformed named capture syntax |
Empty capture name | RegexpError |
(?<>pattern) is invalid |