CrackedRuby - bytebegin and byteend

Overview

Ruby's MatchData#bytebegin and MatchData#byteend methods provide byte-level offset information for regular expression matches. These methods complement the existing character-based begin and end methods by operating at the byte level rather than the character level, which becomes crucial when working with multi-byte encoded strings.

Both methods return integer values representing byte positions within the original string. The bytebegin method returns the starting byte offset of a match or capture group, while byteend returns the ending byte offset. These methods accept either numeric indices for capture groups or string/symbol names for named captures.

# Basic byte offset retrieval
match = /ruby/.match("I love ruby programming")
match.bytebegin(0)  # => 7 (start of full match)
match.byteend(0)    # => 11 (end of full match)

# Multi-byte string demonstration
match = /(こ)(ん)(に)(ち)(は)/.match("こんにちは世界")
match.bytebegin(1)  # => 0 (first character starts at byte 0)
match.byteend(1)    # => 3 (first character ends at byte 3)
match.bytebegin(2)  # => 3 (second character starts at byte 3)
match.byteend(2)    # => 6 (second character ends at byte 6)

The fundamental difference between these byte-level methods and their character-based counterparts becomes apparent when dealing with UTF-8 or other multi-byte encodings. While character methods count logical characters, byte methods count the actual bytes in the string's internal representation.

Basic Usage

The bytebegin and byteend methods support multiple parameter types for flexible match position retrieval. Both methods accept integer indices starting from 0 for the full match, with positive integers referencing capture groups in order of appearance.

# Working with capture groups by index
pattern = /(\w+)\s+(\d+)\s+(\w+)/
match = pattern.match("Product 42 available")

# Full match (index 0)
match.bytebegin(0)  # => 0
match.byteend(0)    # => 17

# Individual capture groups
match.bytebegin(1)  # => 0 (start of "Product")
match.byteend(1)    # => 7 (end of "Product")
match.bytebegin(2)  # => 8 (start of "42")
match.byteend(2)    # => 10 (end of "42")
match.bytebegin(3)  # => 11 (start of "available")
match.byteend(3)    # => 20 (end of "available")

Named capture groups provide more readable access to match positions using string or symbol identifiers. This approach improves code maintainability when working with complex patterns containing multiple captures.

# Named capture group access
email_pattern = /(?<user>[\w._%+-]+)@(?<domain>[\w.-]+\.[A-Z]{2,})/i
match = email_pattern.match("contact@example.com")

# Using string names
match.bytebegin("user")    # => 0
match.byteend("user")      # => 7

# Using symbol names
match.bytebegin(:domain)   # => 8
match.byteend(:domain)     # => 18

# Extract substrings using byte positions
original = match.string
user_part = original.byteslice(match.bytebegin(:user), 
                              match.byteend(:user) - match.bytebegin(:user))
# => "contact"

Multi-byte characters demonstrate the key distinction between byte and character positioning. Each UTF-8 character may occupy multiple bytes, making byte-level access essential for low-level string manipulation and binary data processing.

# Multi-byte character handling
japanese = "プログラム開発"
pattern = /(プ)(ログ)(ラム)(開発)/
match = pattern.match(japanese)

# Character vs byte positions
match.begin(1)      # => 0 (character position)
match.bytebegin(1)  # => 0 (byte position)
match.end(1)        # => 1 (character position)
match.byteend(1)    # => 3 (byte position - プ is 3 bytes)

match.begin(2)      # => 1 (character position)
match.bytebegin(2)  # => 3 (byte position)
match.end(2)        # => 3 (character position)
match.byteend(2)    # => 9 (byte position - ログ is 6 bytes total)

# Demonstrate byte extraction
first_char_bytes = japanese.byteslice(match.bytebegin(1), 
                                     match.byteend(1) - match.bytebegin(1))
# => "プ"

The methods integrate seamlessly with Ruby's string slicing operations, particularly byteslice, enabling precise extraction of matched content at the byte level.

# Advanced extraction patterns
log_pattern = /\[(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] (?<level>\w+): (?<message>.*)/
log_line = "[2024-01-15 10:30:45] ERROR: Database connection failed"
match = log_pattern.match(log_line)

# Extract components using byte positions
original = match.string
timestamp = original.byteslice(match.bytebegin(:timestamp), 
                              match.byteend(:timestamp) - match.bytebegin(:timestamp))
level = original.byteslice(match.bytebegin(:level), 
                          match.byteend(:level) - match.bytebegin(:level))
message = original.byteslice(match.bytebegin(:message), 
                            match.byteend(:message) - match.bytebegin(:message))

puts "Timestamp: #{timestamp}"  # => "2024-01-15 10:30:45"
puts "Level: #{level}"          # => "ERROR"
puts "Message: #{message}"      # => "Database connection failed"

Advanced Usage

The byte-level methods enable sophisticated pattern matching scenarios that require precise control over string manipulation. Complex patterns with nested groups and overlapping matches benefit from byte-accurate positioning for advanced text processing operations.

# Complex nested pattern analysis
html_pattern = /(?<tag><(?<name>\w+)(?<attrs>[^>]*)>)(?<content>.*?)(?<closing><\/\k<name>>)/m
html = '<div class="container" data-role="main">Hello World</div>'
match = html_pattern.match(html)

# Build comprehensive match analysis
analysis = {
  full_match: {
    start: match.bytebegin(0),
    end: match.byteend(0),
    content: html.byteslice(match.bytebegin(0), match.byteend(0) - match.bytebegin(0))
  },
  opening_tag: {
    start: match.bytebegin(:tag),
    end: match.byteend(:tag),
    content: html.byteslice(match.bytebegin(:tag), match.byteend(:tag) - match.bytebegin(:tag))
  },
  tag_name: {
    start: match.bytebegin(:name),
    end: match.byteend(:name),
    content: html.byteslice(match.bytebegin(:name), match.byteend(:name) - match.bytebegin(:name))
  },
  attributes: {
    start: match.bytebegin(:attrs),
    end: match.byteend(:attrs),
    content: html.byteslice(match.bytebegin(:attrs), match.byteend(:attrs) - match.bytebegin(:attrs))
  },
  inner_content: {
    start: match.bytebegin(:content),
    end: match.byteend(:content),
    content: html.byteslice(match.bytebegin(:content), match.byteend(:content) - match.bytebegin(:content))
  }
}

analysis.each do |component, data|
  puts "#{component}: bytes #{data[:start]}-#{data[:end]} => '#{data[:content]}'"
end

Binary data parsing represents another advanced application where byte-level precision becomes essential. The methods work effectively with binary strings and encoded data streams.

# Binary data pattern matching
binary_data = "\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\x00"
png_pattern = /(?<signature>\x89PNG\r\n\x1a\n)(?<ihdr_length>.{4})(?<ihdr_type>IHDR)(?<width>.{4})/n
match = png_pattern.match(binary_data)

# Extract binary components with precise byte positioning
signature_bytes = binary_data.byteslice(match.bytebegin(:signature), 
                                       match.byteend(:signature) - match.bytebegin(:signature))
ihdr_length_bytes = binary_data.byteslice(match.bytebegin(:ihdr_length), 
                                         match.byteend(:ihdr_length) - match.bytebegin(:ihdr_length))
width_bytes = binary_data.byteslice(match.bytebegin(:width), 
                                   match.byteend(:width) - match.bytebegin(:width))

# Convert binary data to meaningful values
ihdr_length = ihdr_length_bytes.unpack('N')[0]  # => 13
width = width_bytes.unpack('N')[0]              # => 256

puts "PNG signature found at bytes #{match.bytebegin(:signature)}-#{match.byteend(:signature)}"
puts "IHDR chunk length: #{ihdr_length} bytes"
puts "Image width: #{width} pixels"

Method chaining with byte operations enables fluent interfaces for complex string processing workflows. The consistency of return types allows for predictable composition patterns.

# Fluent processing chain
class RegexProcessor
  def initialize(text)
    @text = text
    @matches = []
  end

  def find_pattern(regex)
    @text.scan(regex) do |match_data|
      @matches << Regexp.last_match
    end
    self
  end

  def extract_bytes(group_name)
    @matches.map do |match|
      start_byte = match.bytebegin(group_name)
      end_byte = match.byteend(group_name)
      @text.byteslice(start_byte, end_byte - start_byte)
    end
  end

  def byte_ranges(group_name)
    @matches.map do |match|
      (match.bytebegin(group_name)...match.byteend(group_name))
    end
  end
end

# Usage example
text = "Email: user@domain.com Phone: +1-555-0123 Email: admin@site.org"
processor = RegexProcessor.new(text)

emails = processor
  .find_pattern(/Email: (?<email>[\w._%+-]+@[\w.-]+\.[A-Z]{2,})/i)
  .extract_bytes(:email)
# => ["user@domain.com", "admin@site.org"]

email_ranges = processor.byte_ranges(:email)
# => [7...22, 49...64]

Integration with string modification operations requires careful byte position tracking as modifications can shift subsequent byte positions.

# Position-aware string modification
def replace_matches_by_bytes(text, pattern, &block)
  matches = []
  text.scan(pattern) { matches << Regexp.last_match.dup }
  
  # Process matches in reverse order to maintain byte positions
  matches.reverse.each do |match|
    replacement = block.call(match)
    start_byte = match.bytebegin(0)
    end_byte = match.byteend(0)
    
    # Replace using byte positions
    text[start_byte...end_byte] = replacement
  end
  
  text
end

# Transform URLs with byte precision
html = "Visit <a href='http://example.com'>Example</a> and <a href='https://ruby-lang.org'>Ruby</a>"
pattern = /<a href='(?<url>[^']+)'>(?<text>[^<]+)<\/a>/

result = replace_matches_by_bytes(html, pattern) do |match|
  url = html.byteslice(match.bytebegin(:url), match.byteend(:url) - match.bytebegin(:url))
  text = html.byteslice(match.bytebegin(:text), match.byteend(:text) - match.bytebegin(:text))
  "[#{text}](#{url})"
end
# => "Visit [Example](http://example.com) and [Ruby](https://ruby-lang.org)"

Error Handling & Debugging

The bytebegin and byteend methods raise specific exceptions when encountering invalid input conditions. Understanding these error patterns enables robust error handling and effective debugging strategies.

# Index out of bounds handling
pattern = /(foo)(bar)/
match = pattern.match("foobar")

begin
  match.bytebegin(5)  # Group 5 doesn't exist
rescue IndexError => e
  puts "Error: #{e.message}"  # => "index 5 out of matches"
end

# Named group error handling
named_pattern = /(?<first>\w+) (?<second>\w+)/
named_match = named_pattern.match("hello world")

begin
  named_match.bytebegin(:third)  # Named group doesn't exist
rescue IndexError => e
  puts "Error: #{e.message}"  # => "undefined group name reference: third"
end

# Nil match handling
nil_pattern = /(optional_group)?required/
nil_match = nil_pattern.match("required")

# Group 1 exists but didn't match anything
puts nil_match.bytebegin(1)  # => nil (not an error)
puts nil_match.byteend(1)    # => nil (not an error)

Defensive programming patterns help handle edge cases gracefully while maintaining code reliability. Validation methods prevent runtime errors in production environments.

# Robust match processing
def safe_extract_bytes(match, group_identifier)
  return nil unless match
  
  begin
    start_pos = match.bytebegin(group_identifier)
    end_pos = match.byteend(group_identifier)
    
    # Handle nil positions (matched but captured nothing)
    return nil if start_pos.nil? || end_pos.nil?
    
    # Extract using validated positions
    match.string.byteslice(start_pos, end_pos - start_pos)
  rescue IndexError => e
    warn "Match extraction failed: #{e.message}"
    nil
  end
end

# Usage with error handling
pattern = /(?<prefix>pre_)?(?<main>\w+)(?<suffix>_suf)?/
test_strings = [
  "pre_content_suf",  # All groups match
  "content_suf",      # Prefix is nil
  "pre_content",      # Suffix is nil  
  "content",          # Only main group matches
  "nomatch123"        # No match at all
]

test_strings.each do |str|
  match = pattern.match(str)
  next unless match
  
  prefix = safe_extract_bytes(match, :prefix)
  main = safe_extract_bytes(match, :main)
  suffix = safe_extract_bytes(match, :suffix)
  
  puts "String: #{str}"
  puts "  Prefix: #{prefix.inspect}"
  puts "  Main: #{main.inspect}"
  puts "  Suffix: #{suffix.inspect}"
end

Debugging byte position issues often involves comparing character-based and byte-based offsets to identify encoding-related problems. Visualization tools help diagnose complex multi-byte scenarios.

# Debugging helper for byte/character position comparison
def debug_match_positions(text, match)
  puts "String: #{text.inspect}"
  puts "Encoding: #{text.encoding}"
  puts "Character length: #{text.length}"
  puts "Byte length: #{text.bytesize}"
  puts
  
  (0...match.size).each do |index|
    next if match[index].nil?
    
    char_start = match.begin(index)
    char_end = match.end(index)
    byte_start = match.bytebegin(index)
    byte_end = match.byteend(index)
    
    puts "Group #{index}: #{match[index].inspect}"
    puts "  Character positions: #{char_start}..#{char_end}"
    puts "  Byte positions: #{byte_start}..#{byte_end}"
    
    # Show the actual bytes
    extracted = text.byteslice(byte_start, byte_end - byte_start)
    puts "  Extracted bytes: #{extracted.inspect}"
    puts
  end
end

# Debugging multi-byte strings
mixed_text = "Hello 世界 Ruby"
pattern = /(Hello)\s+(世界)\s+(Ruby)/
match = pattern.match(mixed_text)
debug_match_positions(mixed_text, match)

Common debugging scenarios involve validating that byte operations produce expected results, especially when working with different encodings or binary data.

# Comprehensive validation for byte operations
def validate_byte_extraction(original, match, group_id)
  begin
    # Get positions
    start_byte = match.bytebegin(group_id)
    end_byte = match.byteend(group_id)
    
    return false if start_byte.nil? || end_byte.nil?
    
    # Validate position sanity
    unless start_byte >= 0 && end_byte >= start_byte && end_byte <= original.bytesize
      puts "Invalid byte positions: #{start_byte}..#{end_byte} for string of #{original.bytesize} bytes"
      return false
    end
    
    # Extract and compare with match group
    extracted = original.byteslice(start_byte, end_byte - start_byte)
    expected = match[group_id]
    
    unless extracted == expected
      puts "Extraction mismatch for group #{group_id}:"
      puts "  Expected: #{expected.inspect}"
      puts "  Extracted: #{extracted.inspect}"
      puts "  Byte range: #{start_byte}..#{end_byte}"
      return false
    end
    
    true
  rescue => e
    puts "Validation error for group #{group_id}: #{e.message}"
    false
  end
end

# Test validation across different scenarios  
test_cases = [
  { text: "ASCII only", pattern: /(\w+)\s+(\w+)/ },
  { text: "Mixed 文字 encoding", pattern: /(Mixed)\s+(文字)\s+(\w+)/ },
  { text: "Émojis 🚀 included", pattern: /(Émojis)\s+(🚀)\s+(\w+)/ }
]

test_cases.each do |test_case|
  match = test_case[:pattern].match(test_case[:text])
  next unless match
  
  puts "Testing: #{test_case[:text]}"
  (0...match.size).each do |i|
    valid = validate_byte_extraction(test_case[:text], match, i)
    puts "  Group #{i}: #{valid ? 'VALID' : 'INVALID'}"
  end
  puts
end

Common Pitfalls

The most frequent mistake when using bytebegin and byteend involves confusing byte positions with character positions. This confusion becomes critical when working with multi-byte encoded strings where characters occupy more than one byte.

# Pitfall: Assuming bytes equal characters
problematic_text = "café résumé"  # Contains accented characters
pattern = /(\w+)\s+(\w+)/
match = pattern.match(problematic_text)

# WRONG: Using character methods with byte operations
wrong_start = match.begin(1)           # => 0 (character position)
wrong_length = match.end(1) - match.begin(1)  # => 4 (character count)
wrong_extraction = problematic_text.byteslice(wrong_start, wrong_length)
# => "caf" (truncated because é is 2 bytes)

# CORRECT: Using byte methods consistently  
correct_start = match.bytebegin(1)     # => 0 (byte position)
correct_end = match.byteend(1)         # => 5 (byte position)
correct_extraction = problematic_text.byteslice(correct_start, correct_end - correct_start)
# => "café" (complete word)

puts "Wrong extraction: #{wrong_extraction.inspect}"     # => "caf"
puts "Correct extraction: #{correct_extraction.inspect}" # => "café"

Index validation represents another common source of errors. Developers often fail to verify that capture groups exist before accessing their byte positions, leading to runtime exceptions in production.

# Pitfall: Assuming all groups captured successfully
optional_pattern = /(required)(?:_(\w+))?(?:\.(\w+))?/
test_inputs = [
  "required_optional.ext",  # All groups match
  "required_optional",      # Group 3 is nil
  "required.ext",          # Group 2 is nil  
  "required"               # Groups 2 and 3 are nil
]

# WRONG: Not checking for nil values
test_inputs.each do |input|
  match = optional_pattern.match(input)
  
  # This will raise errors or produce unexpected results
  begin
    part1_start = match.bytebegin(1)
    part2_start = match.bytebegin(2)  # May be nil
    part3_start = match.bytebegin(3)  # May be nil
    
    # Operations on nil will fail
    puts "Input: #{input}"
    puts "  Part 1 at: #{part1_start}"
    puts "  Part 2 at: #{part2_start}"  # nil.inspect => "nil"
    puts "  Part 3 at: #{part3_start}"  # nil.inspect => "nil"
  rescue => e
    puts "Error processing #{input}: #{e.message}"
  end
end

# CORRECT: Proper nil handling
test_inputs.each do |input|
  match = optional_pattern.match(input)
  
  puts "Input: #{input}"
  puts "  Part 1: #{match.bytebegin(1) || 'not matched'}"
  puts "  Part 2: #{match.bytebegin(2) || 'not matched'}"
  puts "  Part 3: #{match.bytebegin(3) || 'not matched'}"
end

String mutation during processing creates subtle bugs where byte positions become invalid after modifications. This problem occurs frequently in text processing pipelines.

# Pitfall: Modifying strings while using stored byte positions
text = "Replace FOO with bar and FOO with baz"
pattern = /FOO/
matches = []

# WRONG: Collecting positions then modifying string
text.scan(pattern) { matches << Regexp.last_match.dup }
matches.each do |match|
  start_pos = match.bytebegin(0)
  end_pos = match.byteend(0)
  
  # String modification invalidates subsequent positions
  text[start_pos...end_pos] = "REPLACED"
end
# Result is unpredictable due to position drift

# CORRECT: Process matches in reverse order
text = "Replace FOO with bar and FOO with baz"  # Reset
matches = []
text.scan(pattern) { matches << Regexp.last_match.dup }

# Process from end to beginning to maintain position validity
matches.reverse.each_with_index do |match, index|
  start_pos = match.bytebegin(0)
  end_pos = match.byteend(0)
  replacement = index.zero? ? "baz" : "bar"
  
  text[start_pos...end_pos] = replacement
end
puts text  # => "Replace bar with bar and baz with baz"

Encoding mismatch issues arise when working with strings in different encodings. The byte methods always work with the string's internal byte representation, which may not match expectations.

# Pitfall: Encoding assumptions
utf8_string = "Testing 测试"
ascii_pattern = /(\w+)\s+(.+)/

# Force different encoding interpretation
ascii_string = utf8_string.dup.force_encoding('ASCII-8BIT')
match_utf8 = ascii_pattern.match(utf8_string)
match_ascii = ascii_pattern.match(ascii_string)

puts "UTF-8 string: #{utf8_string.inspect}"
puts "ASCII string: #{ascii_string.inspect}"

# Byte positions may be the same, but extraction differs
if match_utf8 && match_ascii
  utf8_extraction = utf8_string.byteslice(match_utf8.bytebegin(2), 
                                         match_utf8.byteend(2) - match_utf8.bytebegin(2))
  ascii_extraction = ascii_string.byteslice(match_ascii.bytebegin(2),
                                           match_ascii.byteend(2) - match_ascii.bytebegin(2))
  
  puts "UTF-8 extraction: #{utf8_extraction.inspect}"
  puts "ASCII extraction: #{ascii_extraction.inspect}"
  # Different results due to encoding interpretation
end

Named capture confusion occurs when developers use inconsistent naming conventions or attempt to access groups that don't exist in the pattern.

# Pitfall: Inconsistent named group access
pattern = /(?<user_name>\w+)@(?<domain_name>[\w.-]+)/
email = "user@example.com"
match = pattern.match(email)

# WRONG: Inconsistent naming conventions
begin
  user = match.bytebegin(:user)      # Error: group is :user_name
  domain = match.bytebegin("domain") # Error: group is :domain_name
rescue IndexError => e
  puts "Named group error: #{e.message}"
end

# CORRECT: Use exact group names from pattern
user = match.bytebegin(:user_name)     # Works correctly
domain = match.bytebegin(:domain_name) # Works correctly

# Helper method to avoid naming confusion
def extract_named_groups_bytes(match)
  result = {}
  match.names.each do |name|
    start_pos = match.bytebegin(name)
    end_pos = match.byteend(name)
    next if start_pos.nil?
    
    result[name] = match.string.byteslice(start_pos, end_pos - start_pos)
  end
  result
end

extracted = extract_named_groups_bytes(match)
puts extracted  # => {"user_name"=>"user", "domain_name"=>"example.com"}

Performance assumptions about byte operations can lead to inefficient code patterns. While byte operations are generally faster than character operations, repeated calls within loops can create bottlenecks.

# Pitfall: Inefficient repeated byte position calls
large_text = "word " * 10000  # Large text
pattern = /(\w+)/
matches = []
large_text.scan(pattern) { matches << Regexp.last_match.dup }

# WRONG: Repeated method calls in loop
start_time = Time.now
results = matches.map do |match|
  # Multiple method calls per iteration
  start_pos = match.bytebegin(0)
  end_pos = match.byteend(0)
  length = end_pos - start_pos
  {
    content: large_text.byteslice(start_pos, length),
    start: start_pos,
    end: end_pos,
    length: length
  }
end
slow_time = Time.now - start_time

# BETTER: Minimize method calls
start_time = Time.now
results = matches.map do |match|
  start_pos = match.bytebegin(0)
  end_pos = match.byteend(0)
  content = large_text.byteslice(start_pos, end_pos - start_pos)
  
  {
    content: content,
    start: start_pos,
    end: end_pos,
    length: content.bytesize  # Avoid subtraction
  }
end
fast_time = Time.now - start_time

puts "Slow approach: #{slow_time.round(4)} seconds"
puts "Fast approach: #{fast_time.round(4)} seconds"
puts "Improvement: #{((slow_time - fast_time) / slow_time * 100).round(1)}%"

Reference

Method Signatures

Method	Parameters	Returns	Description
`bytebegin(n)`	`n` (Integer)	`Integer` or `nil`	Returns starting byte offset of nth capture group
`bytebegin(name)`	`name` (String/Symbol)	`Integer` or `nil`	Returns starting byte offset of named capture group
`byteend(n)`	`n` (Integer)	`Integer` or `nil`	Returns ending byte offset of nth capture group
`byteend(name)`	`name` (String/Symbol)	`Integer` or `nil`	Returns ending byte offset of named capture group

Parameter Details

Numeric Parameters:

0: Full match (entire matched string)
1, 2, 3, ...: Capture groups in order of appearance
Negative values: Not supported (raises IndexError)

Named Parameters:

String: Named capture group identifier (case-sensitive)
Symbol: Named capture group identifier (case-sensitive)
Must exactly match names defined in regex pattern

Return Values

Success Cases:

Integer: Valid byte offset within original string
Range: 0 to string.bytesize
bytebegin: Inclusive start position
byteend: Exclusive end position

Failure Cases:

nil: Group exists but didn't match (optional groups)
IndexError: Invalid group index or unknown group name

Exception Hierarchy

IndexError
├── "index N out of matches" (invalid numeric group)
└── "undefined group name reference: NAME" (invalid named group)

Related Methods Comparison

Method	Position Type	Return Value	Use Case
`begin(n)`	Character offset	`Integer`/`nil`	Text processing with character granularity
`end(n)`	Character offset	`Integer`/`nil`	Text processing with character granularity
`bytebegin(n)`	Byte offset	`Integer`/`nil`	Binary data, encoding-aware processing
`byteend(n)`	Byte offset	`Integer`/`nil`	Binary data, encoding-aware processing
`byteoffset(n)`	Byte range	`[Integer, Integer]`	Combined start/end byte positions
`offset(n)`	Character range	`[Integer, Integer]`	Combined start/end character positions

Common Patterns

Basic Extraction:

start_pos = match.bytebegin(group)
end_pos = match.byteend(group)
content = string.byteslice(start_pos, end_pos - start_pos)

Safe Extraction with Validation:

start_pos = match.bytebegin(group)
end_pos = match.byteend(group)
content = (start_pos && end_pos) ? string.byteslice(start_pos, end_pos - start_pos) : nil

Range-based Operations:

byte_range = match.bytebegin(group)...match.byteend(group)
content = string.byteslice(byte_range)

Performance Characteristics

Time Complexity:

O(1) for byte position retrieval
O(n) for character position retrieval (where n = string length)

Memory Usage:

No additional memory allocation
Returns primitive integer values

Encoding Impact:

Multi-byte encodings: Byte methods are faster
ASCII-only strings: Minimal performance difference
Binary data: Byte methods are required

Compatibility Notes

Ruby Versions:

Available: Ruby 3.4+
Feature Request: #20576
Not available in earlier Ruby versions

Encoding Support:

All string encodings supported
Binary data compatible (ASCII-8BIT encoding)
UTF-8, UTF-16, UTF-32 fully supported
Position values always in bytes regardless of encoding

Platform Differences:

Consistent behavior across platforms
No platform-specific variations
Thread-safe operations

bytebegin and byteend