CrackedRuby logo

CrackedRuby

Named Captures

Named captures in Ruby regular expressions that allow referencing matched groups by descriptive names instead of numeric positions.

Core Built-in Classes Regexp and MatchData
2.7.4

Overview

Named captures provide a way to assign meaningful names to capture groups in regular expressions, making pattern matching more readable and maintainable. Ruby implements named captures through the (?<name>pattern) syntax within regular expressions, allowing developers to access matched content by name rather than numeric index.

The MatchData object returned by matching operations contains named capture information accessible through the [] operator with string or symbol keys. Named captures work with all Ruby regex matching methods including String#match, String#scan, Regexp#match, and String#=~.

text = "Contact: john.doe@example.com"
pattern = /Contact: (?<username>\w+)\.(?<domain>\w+)@(?<host>[\w.]+)/
match = text.match(pattern)

match[:username]  # => "john"
match[:domain]    # => "doe"
match[:host]      # => "example.com"

Named captures integrate with Ruby's global variables $1, $2, etc., maintaining backward compatibility while providing named access. The MatchData object preserves both numeric and named access methods simultaneously.

pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
"2023-12-15".match(pattern)

$1          # => "2023" (first capture)
$~[:year]   # => "2023" (named access)
$~[:month]  # => "12"
$~[:day]    # => "15"

Ruby supports mixed named and unnamed captures within the same pattern, though this approach reduces code clarity. Named captures become particularly valuable when working with complex patterns containing multiple groups or when patterns are reused across different contexts.

Basic Usage

The fundamental syntax for named captures uses (?<name>pattern) where name becomes the identifier for accessing the matched content. The name must be a valid Ruby identifier starting with a letter or underscore.

email_pattern = /(?<local>[^@]+)@(?<domain>[^.]+)\.(?<tld>\w+)/
email = "admin@company.org"
match = email.match(email_pattern)

match[:local]   # => "admin"
match[:domain]  # => "company"
match[:tld]     # => "org"

String matching methods return MatchData objects when matches succeed, nil when they fail. The MatchData object provides access to named captures through bracket notation accepting strings, symbols, or numeric indices.

log_pattern = /\[(?<timestamp>[^\]]+)\] (?<level>\w+): (?<message>.*)/
log_line = "[2023-12-15 14:30:15] ERROR: Database connection failed"

match = log_line.match(log_pattern)
if match
  puts "Time: #{match[:timestamp]}"
  puts "Level: #{match[:level]}"
  puts "Message: #{match[:message]}"
end

The String#scan method works with named captures but returns arrays of captured values rather than MatchData objects. When using named captures with scan, Ruby returns arrays containing the captured values in the order they appear in the pattern.

text = "User: alice, Age: 25; User: bob, Age: 30"
pattern = /User: (?<name>\w+), Age: (?<age>\d+)/
matches = text.scan(pattern)
# => [["alice", "25"], ["bob", "30"]]

Named captures support quantifiers and optional groups like standard captures. Optional named groups return nil when they don't match, while repeated named groups capture only the last match.

url_pattern = /https?:\/\/(?<subdomain>\w+\.)?(?<domain>\w+)\.(?<tld>\w+)/
urls = [
  "https://www.example.com",
  "http://example.com",
  "https://api.service.org"
]

urls.each do |url|
  match = url.match(url_pattern)
  puts "Domain: #{match[:domain]}, Subdomain: #{match[:subdomain]}"
end
# Domain: example, Subdomain: www.
# Domain: example, Subdomain: 
# Domain: service, Subdomain: api.

Advanced Usage

Named captures support complex nesting and alternation patterns. When multiple named groups share the same name within alternation branches, Ruby captures whichever branch matches successfully.

phone_pattern = /
  (?:
    (?<country>\+\d{1,3})\s*
    (?<area>\(\d{3}\)|\d{3})[-.\s]*
    (?<exchange>\d{3})[-.\s]*
    (?<number>\d{4})
  |
    (?<country>\+\d{1,3})[-.\s]*
    (?<area>\d{2,4})[-.\s]*
    (?<exchange>\d{3,4})[-.\s]*
    (?<number>\d{4})
  )
/x

phones = [
  "+1 (555) 123-4567",
  "+44 20 7946 0958"
]

phones.each do |phone|
  match = phone.match(phone_pattern)
  if match
    puts "Country: #{match[:country]}"
    puts "Area: #{match[:area]}"
    puts "Exchange: #{match[:exchange]}"
    puts "Number: #{match[:number]}"
    puts "---"
  end
end

Named captures integrate with lookahead and lookbehind assertions, enabling complex validation patterns while maintaining readable capture names.

password_pattern = /
  (?=.*(?<has_lower>[a-z]))
  (?=.*(?<has_upper>[A-Z]))
  (?=.*(?<has_digit>\d))
  (?=.*(?<has_special>[!@#$%^&*]))
  (?<password>.{8,})
/x

def validate_password(password)
  match = password.match(password_pattern)
  return false unless match
  
  validations = {
    lowercase: !match[:has_lower].nil?,
    uppercase: !match[:has_upper].nil?,
    digit: !match[:has_digit].nil?,
    special: !match[:has_special].nil?,
    length: match[:password].length >= 8
  }
  
  validations.all? { |_, valid| valid }
end

The MatchData#named_captures method returns a hash of all named captures, useful for dynamic processing or debugging complex patterns.

csv_pattern = /
  (?:"(?<quoted_field>[^"]*)"|(?<unquoted_field>[^,\n]*))
  (?:,(?:"(?<next_quoted>[^"]*)"|(?<next_unquoted>[^,\n]*)))*
/x

csv_line = 'John Doe,"123 Main St, Apt 4",555-1234'
match = csv_line.match(csv_pattern)

match.named_captures.each do |name, value|
  puts "#{name}: #{value.inspect}" if value
end

Named captures can reference other named captures within the same pattern using \k<name> syntax, enabling complex matching scenarios like balanced delimiters or repeated patterns.

balanced_quotes = /
  (?<quote>['"])
  (?<content>(?:[^\\]|\\.)*?)
  \k<quote>
/x

text = 'He said "Hello there" and she replied \'Good morning\''
matches = text.scan(balanced_quotes)

matches.each do |quote_char, content|
  puts "Found quoted text: #{content} (using #{quote_char})"
end

Production Patterns

Named captures excel in web application contexts where pattern matching drives routing, validation, and data extraction. Rails applications commonly use named captures for custom route constraints and parameter parsing.

class CustomRouteConstraint
  def initialize(pattern)
    @pattern = pattern
  end
  
  def matches?(request)
    match = request.path.match(@pattern)
    return false unless match
    
    request.env[:route_captures] = match.named_captures
    true
  end
end

# Route constraint for API versioning
api_constraint = CustomRouteConstraint.new(
  /\/api\/v(?<version>\d+)\/(?<resource>\w+)\/(?<id>\d+)/
)

# Usage in routes.rb would access captures via request.env[:route_captures]

Log parsing represents another common production use case where named captures improve maintainability and debugging capability compared to positional captures.

class LogParser
  ACCESS_LOG_PATTERN = /
    (?<ip>\d+\.\d+\.\d+\.\d+)\s+
    -\s+
    -\s+
    \[(?<timestamp>[^\]]+)\]\s+
    "(?<method>\w+)\s+(?<path>[^\s]+)\s+HTTP\/(?<http_version>[\d.]+)"\s+
    (?<status>\d+)\s+
    (?<size>\d+|-)\s+
    "(?<referer>[^"]*)"\s+
    "(?<user_agent>[^"]*)"
  /x
  
  def self.parse_line(line)
    match = line.match(ACCESS_LOG_PATTERN)
    return nil unless match
    
    {
      ip_address: match[:ip],
      timestamp: Time.strptime(match[:timestamp], "%d/%b/%Y:%H:%M:%S %z"),
      http_method: match[:method],
      request_path: match[:path],
      http_version: match[:http_version],
      status_code: match[:status].to_i,
      response_size: match[:size] == "-" ? 0 : match[:size].to_i,
      referer: match[:referer],
      user_agent: match[:user_agent]
    }
  rescue ArgumentError
    nil
  end
end

# Process log file
File.foreach("access.log") do |line|
  parsed = LogParser.parse_line(line.chomp)
  next unless parsed
  
  # Process parsed log entry
  if parsed[:status_code] >= 400
    puts "Error: #{parsed[:status_code]} for #{parsed[:request_path]}"
  end
end

Data validation and sanitization workflows benefit from named captures when extracting and validating structured input. Named captures make validation logic more maintainable and error messages more descriptive.

class DataValidator
  PATTERNS = {
    email: /\A(?<local>[a-zA-Z0-9._-]+)@(?<domain>[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})\z/,
    phone: /\A(?:\+(?<country>\d{1,3})[-.\s]?)?(?<area>\d{3})[-.\s]?(?<exchange>\d{3})[-.\s]?(?<number>\d{4})\z/,
    credit_card: /\A(?<type>4\d{3}|5[1-5]\d{2}|3[47]\d{2})-?(?<group1>\d{4})-?(?<group2>\d{4})-?(?<group3>\d{4})\z/
  }.freeze
  
  def self.validate_and_extract(type, value)
    pattern = PATTERNS[type]
    return { valid: false, error: "Unknown validation type" } unless pattern
    
    match = value.match(pattern)
    return { valid: false, error: "Invalid format" } unless match
    
    result = { valid: true, captures: match.named_captures }
    
    case type
    when :email
      result[:normalized] = "#{match[:local]}@#{match[:domain].downcase}"
    when :phone
      digits = [match[:area], match[:exchange], match[:number]].join
      result[:normalized] = match[:country] ? "+#{match[:country]}#{digits}" : digits
    when :credit_card
      result[:masked] = "**** **** **** #{match[:group3]}"
      result[:type] = case match[:type][0]
                     when '4' then 'Visa'
                     when '5' then 'MasterCard'
                     when '3' then 'American Express'
                     end
    end
    
    result
  end
end

# Usage in form processing
validation_result = DataValidator.validate_and_extract(:email, "user@EXAMPLE.COM")
if validation_result[:valid]
  email = validation_result[:normalized]  # => "user@example.com"
  local_part = validation_result[:captures]["local"]  # => "user"
end

Common Pitfalls

Named capture behavior differs subtly from numbered captures when dealing with optional groups and alternations. Optional named captures return nil rather than empty strings, which can cause unexpected behavior in string concatenation.

optional_pattern = /(?<prefix>\w+:)?(?<value>\w+)/

"test".match(optional_pattern)
# match[:prefix] is nil, not ""
# match[:value] is "test"

# This will raise TypeError
begin
  result = match[:prefix] + match[:value]
rescue TypeError => e
  puts "Error: #{e.message}"
end

# Correct approach
result = "#{match[:prefix]}#{match[:value]}"  # => "test"

Repeated named captures only preserve the last match, not all matches. This behavior catches developers who expect array-like behavior from repeated named groups.

repeated_pattern = /(?<word>\w+),?\s*/
text = "apple, banana, cherry"

# This only captures "cherry", not all words
match = text.match(/(?<word>\w+)(?:,\s*(?<word>\w+))*/
match[:word]  # => "cherry", not ["apple", "banana", "cherry"]

# Correct approach for multiple captures
words = text.scan(/(?<word>\w+)/).flatten
# => ["apple", "banana", "cherry"]

Named captures in alternation branches can produce confusing results when multiple branches contain the same capture name but different semantic meanings.

# Problematic pattern with same name in different contexts
ambiguous_pattern = /
  (?:user:(?<identifier>\w+))|
  (?:email:(?<identifier>[^@]+@[^.]+\.\w+))
/x

"user:john123".match(ambiguous_pattern)[:identifier]  # => "john123"
"email:john@example.com".match(ambiguous_pattern)[:identifier]  # => "john@example.com"

# The same :identifier name has different meanings
# Better approach uses distinct names
clear_pattern = /
  (?:user:(?<username>\w+))|
  (?:email:(?<email_address>[^@]+@[^.]+\.\w+))
/x

Performance implications arise when using many named captures or complex patterns. Named captures add overhead compared to numbered captures, particularly when MatchData#named_captures gets called frequently.

# Performance-sensitive code should avoid
heavy_pattern = /
  (?<field1>\w+)\s+
  (?<field2>\w+)\s+
  (?<field3>\w+)\s+
  (?<field4>\w+)\s+
  (?<field5>\w+)\s+
  (?<field6>\w+)\s+
  (?<field7>\w+)
/x

large_file_lines.each do |line|
  match = line.match(heavy_pattern)
  next unless match
  
  # This creates hash repeatedly - expensive
  data = match.named_captures
  process_data(data)
end

# More efficient approach
large_file_lines.each do |line|
  match = line.match(heavy_pattern)
  next unless match
  
  # Direct access avoids hash creation
  process_fields(match[:field1], match[:field2], match[:field3])
end

Global variable behavior with named captures can surprise developers. The special variables $1, $2, etc., still work with named captures but follow numeric order, not alphabetical order of names.

pattern = /(?<zebra>\w+)\s+(?<alpha>\w+)/
"first second".match(pattern)

$1  # => "first" (zebra capture, not alpha)
$2  # => "second" (alpha capture)

# Named access doesn't affect global variable order
$~[:zebra]  # => "first"
$~[:alpha]  # => "second"

Unicode identifier names in captures work but can cause portability issues across different Ruby versions and systems. ASCII names provide better compatibility.

# Works but potentially problematic
unicode_pattern = /(?<数字>\d+)(?<文字>\w+)/
text = "123abc"
match = text.match(unicode_pattern)
match[:数字]  # => "123"

# Safer approach
ascii_pattern = /(?<digits>\d+)(?<letters>\w+)/
match = text.match(ascii_pattern)
match[:digits]  # => "123"

Reference

MatchData Methods

Method Parameters Returns Description
#[] name (String/Symbol/Integer) String or nil Access named or numbered capture
#named_captures None Hash Hash of all named captures
#names None Array<String> Array of all capture names
#values_at *names (String/Symbol/Integer) Array Multiple captures by name or index

String Methods with Named Captures

Method Parameters Returns Description
#match pattern, pos=0 MatchData or nil Match pattern, return MatchData
#match? pattern, pos=0 Boolean Test if pattern matches
#scan pattern Array All matches as arrays of captures
#gsub pattern, replacement String Replace with access to named captures

Regexp Methods

Method Parameters Returns Description
#match string, pos=0 MatchData or nil Match against string
#match? string, pos=0 Boolean Test match against string
#named_captures None Hash Template of named captures
#names None Array<String> Array of capture names in pattern

Named Capture Syntax

Syntax Description Example
(?<name>pattern) Basic named capture (?<word>\w+)
(?'name'pattern) Alternative syntax (?'word'\w+)
\k<name> Reference named capture (?<quote>['"])\w+\k<quote>
\k'name' Alternative reference (?'quote'['"])\w+\k'quote'

Global Variables

Variable Description Type
$~ Last MatchData object MatchData or nil
$1, $2, ... Numbered captures (work with named) String or nil
$+ Last capture group String or nil
$& Entire match String or nil

Replacement Patterns in gsub

Pattern Description Example
\k<name> Named capture reference "text".gsub(/(?<word>\w+)/, '[\k<word>]')
\k'name' Alternative reference syntax "text".gsub(/(?'word'\w+)/, '[\k'word']')
Block form Block receives MatchData "text".gsub(/(?<word>\w+)/) { |m| m[:word].upcase }

MatchData Hash Access

Access Method Result Type Description
match[:name] String or nil Symbol key access
match["name"] String or nil String key access
match[0] String Entire match (index 0)
match[1] String or nil First capture group

Error Conditions

Condition Exception Description
Invalid capture name RegexpError Name contains invalid characters
Duplicate capture name None Later captures override earlier ones
Missing closing > RegexpError Malformed named capture syntax
Empty capture name RegexpError (?<>pattern) is invalid