Overview
Ruby's MatchData#bytebegin
and MatchData#byteend
methods provide byte-level offset information for regular expression matches. These methods complement the existing character-based begin
and end
methods by operating at the byte level rather than the character level, which becomes crucial when working with multi-byte encoded strings.
Both methods return integer values representing byte positions within the original string. The bytebegin
method returns the starting byte offset of a match or capture group, while byteend
returns the ending byte offset. These methods accept either numeric indices for capture groups or string/symbol names for named captures.
# Basic byte offset retrieval
match = /ruby/.match("I love ruby programming")
match.bytebegin(0) # => 7 (start of full match)
match.byteend(0) # => 11 (end of full match)
# Multi-byte string demonstration
match = /(こ)(ん)(に)(ち)(は)/.match("こんにちは世界")
match.bytebegin(1) # => 0 (first character starts at byte 0)
match.byteend(1) # => 3 (first character ends at byte 3)
match.bytebegin(2) # => 3 (second character starts at byte 3)
match.byteend(2) # => 6 (second character ends at byte 6)
The fundamental difference between these byte-level methods and their character-based counterparts becomes apparent when dealing with UTF-8 or other multi-byte encodings. While character methods count logical characters, byte methods count the actual bytes in the string's internal representation.
Basic Usage
The bytebegin
and byteend
methods support multiple parameter types for flexible match position retrieval. Both methods accept integer indices starting from 0 for the full match, with positive integers referencing capture groups in order of appearance.
# Working with capture groups by index
pattern = /(\w+)\s+(\d+)\s+(\w+)/
match = pattern.match("Product 42 available")
# Full match (index 0)
match.bytebegin(0) # => 0
match.byteend(0) # => 17
# Individual capture groups
match.bytebegin(1) # => 0 (start of "Product")
match.byteend(1) # => 7 (end of "Product")
match.bytebegin(2) # => 8 (start of "42")
match.byteend(2) # => 10 (end of "42")
match.bytebegin(3) # => 11 (start of "available")
match.byteend(3) # => 20 (end of "available")
Named capture groups provide more readable access to match positions using string or symbol identifiers. This approach improves code maintainability when working with complex patterns containing multiple captures.
# Named capture group access
email_pattern = /(?<user>[\w._%+-]+)@(?<domain>[\w.-]+\.[A-Z]{2,})/i
match = email_pattern.match("contact@example.com")
# Using string names
match.bytebegin("user") # => 0
match.byteend("user") # => 7
# Using symbol names
match.bytebegin(:domain) # => 8
match.byteend(:domain) # => 18
# Extract substrings using byte positions
original = match.string
user_part = original.byteslice(match.bytebegin(:user),
match.byteend(:user) - match.bytebegin(:user))
# => "contact"
Multi-byte characters demonstrate the key distinction between byte and character positioning. Each UTF-8 character may occupy multiple bytes, making byte-level access essential for low-level string manipulation and binary data processing.
# Multi-byte character handling
japanese = "プログラム開発"
pattern = /(プ)(ログ)(ラム)(開発)/
match = pattern.match(japanese)
# Character vs byte positions
match.begin(1) # => 0 (character position)
match.bytebegin(1) # => 0 (byte position)
match.end(1) # => 1 (character position)
match.byteend(1) # => 3 (byte position - プ is 3 bytes)
match.begin(2) # => 1 (character position)
match.bytebegin(2) # => 3 (byte position)
match.end(2) # => 3 (character position)
match.byteend(2) # => 9 (byte position - ログ is 6 bytes total)
# Demonstrate byte extraction
first_char_bytes = japanese.byteslice(match.bytebegin(1),
match.byteend(1) - match.bytebegin(1))
# => "プ"
The methods integrate seamlessly with Ruby's string slicing operations, particularly byteslice
, enabling precise extraction of matched content at the byte level.
# Advanced extraction patterns
log_pattern = /\[(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] (?<level>\w+): (?<message>.*)/
log_line = "[2024-01-15 10:30:45] ERROR: Database connection failed"
match = log_pattern.match(log_line)
# Extract components using byte positions
original = match.string
timestamp = original.byteslice(match.bytebegin(:timestamp),
match.byteend(:timestamp) - match.bytebegin(:timestamp))
level = original.byteslice(match.bytebegin(:level),
match.byteend(:level) - match.bytebegin(:level))
message = original.byteslice(match.bytebegin(:message),
match.byteend(:message) - match.bytebegin(:message))
puts "Timestamp: #{timestamp}" # => "2024-01-15 10:30:45"
puts "Level: #{level}" # => "ERROR"
puts "Message: #{message}" # => "Database connection failed"
Advanced Usage
The byte-level methods enable sophisticated pattern matching scenarios that require precise control over string manipulation. Complex patterns with nested groups and overlapping matches benefit from byte-accurate positioning for advanced text processing operations.
# Complex nested pattern analysis
html_pattern = /(?<tag><(?<name>\w+)(?<attrs>[^>]*)>)(?<content>.*?)(?<closing><\/\k<name>>)/m
html = '<div class="container" data-role="main">Hello World</div>'
match = html_pattern.match(html)
# Build comprehensive match analysis
analysis = {
full_match: {
start: match.bytebegin(0),
end: match.byteend(0),
content: html.byteslice(match.bytebegin(0), match.byteend(0) - match.bytebegin(0))
},
opening_tag: {
start: match.bytebegin(:tag),
end: match.byteend(:tag),
content: html.byteslice(match.bytebegin(:tag), match.byteend(:tag) - match.bytebegin(:tag))
},
tag_name: {
start: match.bytebegin(:name),
end: match.byteend(:name),
content: html.byteslice(match.bytebegin(:name), match.byteend(:name) - match.bytebegin(:name))
},
attributes: {
start: match.bytebegin(:attrs),
end: match.byteend(:attrs),
content: html.byteslice(match.bytebegin(:attrs), match.byteend(:attrs) - match.bytebegin(:attrs))
},
inner_content: {
start: match.bytebegin(:content),
end: match.byteend(:content),
content: html.byteslice(match.bytebegin(:content), match.byteend(:content) - match.bytebegin(:content))
}
}
analysis.each do |component, data|
puts "#{component}: bytes #{data[:start]}-#{data[:end]} => '#{data[:content]}'"
end
Binary data parsing represents another advanced application where byte-level precision becomes essential. The methods work effectively with binary strings and encoded data streams.
# Binary data pattern matching
binary_data = "\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\x00"
png_pattern = /(?<signature>\x89PNG\r\n\x1a\n)(?<ihdr_length>.{4})(?<ihdr_type>IHDR)(?<width>.{4})/n
match = png_pattern.match(binary_data)
# Extract binary components with precise byte positioning
signature_bytes = binary_data.byteslice(match.bytebegin(:signature),
match.byteend(:signature) - match.bytebegin(:signature))
ihdr_length_bytes = binary_data.byteslice(match.bytebegin(:ihdr_length),
match.byteend(:ihdr_length) - match.bytebegin(:ihdr_length))
width_bytes = binary_data.byteslice(match.bytebegin(:width),
match.byteend(:width) - match.bytebegin(:width))
# Convert binary data to meaningful values
ihdr_length = ihdr_length_bytes.unpack('N')[0] # => 13
width = width_bytes.unpack('N')[0] # => 256
puts "PNG signature found at bytes #{match.bytebegin(:signature)}-#{match.byteend(:signature)}"
puts "IHDR chunk length: #{ihdr_length} bytes"
puts "Image width: #{width} pixels"
Method chaining with byte operations enables fluent interfaces for complex string processing workflows. The consistency of return types allows for predictable composition patterns.
# Fluent processing chain
class RegexProcessor
def initialize(text)
@text = text
@matches = []
end
def find_pattern(regex)
@text.scan(regex) do |match_data|
@matches << Regexp.last_match
end
self
end
def extract_bytes(group_name)
@matches.map do |match|
start_byte = match.bytebegin(group_name)
end_byte = match.byteend(group_name)
@text.byteslice(start_byte, end_byte - start_byte)
end
end
def byte_ranges(group_name)
@matches.map do |match|
(match.bytebegin(group_name)...match.byteend(group_name))
end
end
end
# Usage example
text = "Email: user@domain.com Phone: +1-555-0123 Email: admin@site.org"
processor = RegexProcessor.new(text)
emails = processor
.find_pattern(/Email: (?<email>[\w._%+-]+@[\w.-]+\.[A-Z]{2,})/i)
.extract_bytes(:email)
# => ["user@domain.com", "admin@site.org"]
email_ranges = processor.byte_ranges(:email)
# => [7...22, 49...64]
Integration with string modification operations requires careful byte position tracking as modifications can shift subsequent byte positions.
# Position-aware string modification
def replace_matches_by_bytes(text, pattern, &block)
matches = []
text.scan(pattern) { matches << Regexp.last_match.dup }
# Process matches in reverse order to maintain byte positions
matches.reverse.each do |match|
replacement = block.call(match)
start_byte = match.bytebegin(0)
end_byte = match.byteend(0)
# Replace using byte positions
text[start_byte...end_byte] = replacement
end
text
end
# Transform URLs with byte precision
html = "Visit <a href='http://example.com'>Example</a> and <a href='https://ruby-lang.org'>Ruby</a>"
pattern = /<a href='(?<url>[^']+)'>(?<text>[^<]+)<\/a>/
result = replace_matches_by_bytes(html, pattern) do |match|
url = html.byteslice(match.bytebegin(:url), match.byteend(:url) - match.bytebegin(:url))
text = html.byteslice(match.bytebegin(:text), match.byteend(:text) - match.bytebegin(:text))
"[#{text}](#{url})"
end
# => "Visit [Example](http://example.com) and [Ruby](https://ruby-lang.org)"
Error Handling & Debugging
The bytebegin
and byteend
methods raise specific exceptions when encountering invalid input conditions. Understanding these error patterns enables robust error handling and effective debugging strategies.
# Index out of bounds handling
pattern = /(foo)(bar)/
match = pattern.match("foobar")
begin
match.bytebegin(5) # Group 5 doesn't exist
rescue IndexError => e
puts "Error: #{e.message}" # => "index 5 out of matches"
end
# Named group error handling
named_pattern = /(?<first>\w+) (?<second>\w+)/
named_match = named_pattern.match("hello world")
begin
named_match.bytebegin(:third) # Named group doesn't exist
rescue IndexError => e
puts "Error: #{e.message}" # => "undefined group name reference: third"
end
# Nil match handling
nil_pattern = /(optional_group)?required/
nil_match = nil_pattern.match("required")
# Group 1 exists but didn't match anything
puts nil_match.bytebegin(1) # => nil (not an error)
puts nil_match.byteend(1) # => nil (not an error)
Defensive programming patterns help handle edge cases gracefully while maintaining code reliability. Validation methods prevent runtime errors in production environments.
# Robust match processing
def safe_extract_bytes(match, group_identifier)
return nil unless match
begin
start_pos = match.bytebegin(group_identifier)
end_pos = match.byteend(group_identifier)
# Handle nil positions (matched but captured nothing)
return nil if start_pos.nil? || end_pos.nil?
# Extract using validated positions
match.string.byteslice(start_pos, end_pos - start_pos)
rescue IndexError => e
warn "Match extraction failed: #{e.message}"
nil
end
end
# Usage with error handling
pattern = /(?<prefix>pre_)?(?<main>\w+)(?<suffix>_suf)?/
test_strings = [
"pre_content_suf", # All groups match
"content_suf", # Prefix is nil
"pre_content", # Suffix is nil
"content", # Only main group matches
"nomatch123" # No match at all
]
test_strings.each do |str|
match = pattern.match(str)
next unless match
prefix = safe_extract_bytes(match, :prefix)
main = safe_extract_bytes(match, :main)
suffix = safe_extract_bytes(match, :suffix)
puts "String: #{str}"
puts " Prefix: #{prefix.inspect}"
puts " Main: #{main.inspect}"
puts " Suffix: #{suffix.inspect}"
end
Debugging byte position issues often involves comparing character-based and byte-based offsets to identify encoding-related problems. Visualization tools help diagnose complex multi-byte scenarios.
# Debugging helper for byte/character position comparison
def debug_match_positions(text, match)
puts "String: #{text.inspect}"
puts "Encoding: #{text.encoding}"
puts "Character length: #{text.length}"
puts "Byte length: #{text.bytesize}"
puts
(0...match.size).each do |index|
next if match[index].nil?
char_start = match.begin(index)
char_end = match.end(index)
byte_start = match.bytebegin(index)
byte_end = match.byteend(index)
puts "Group #{index}: #{match[index].inspect}"
puts " Character positions: #{char_start}..#{char_end}"
puts " Byte positions: #{byte_start}..#{byte_end}"
# Show the actual bytes
extracted = text.byteslice(byte_start, byte_end - byte_start)
puts " Extracted bytes: #{extracted.inspect}"
puts
end
end
# Debugging multi-byte strings
mixed_text = "Hello 世界 Ruby"
pattern = /(Hello)\s+(世界)\s+(Ruby)/
match = pattern.match(mixed_text)
debug_match_positions(mixed_text, match)
Common debugging scenarios involve validating that byte operations produce expected results, especially when working with different encodings or binary data.
# Comprehensive validation for byte operations
def validate_byte_extraction(original, match, group_id)
begin
# Get positions
start_byte = match.bytebegin(group_id)
end_byte = match.byteend(group_id)
return false if start_byte.nil? || end_byte.nil?
# Validate position sanity
unless start_byte >= 0 && end_byte >= start_byte && end_byte <= original.bytesize
puts "Invalid byte positions: #{start_byte}..#{end_byte} for string of #{original.bytesize} bytes"
return false
end
# Extract and compare with match group
extracted = original.byteslice(start_byte, end_byte - start_byte)
expected = match[group_id]
unless extracted == expected
puts "Extraction mismatch for group #{group_id}:"
puts " Expected: #{expected.inspect}"
puts " Extracted: #{extracted.inspect}"
puts " Byte range: #{start_byte}..#{end_byte}"
return false
end
true
rescue => e
puts "Validation error for group #{group_id}: #{e.message}"
false
end
end
# Test validation across different scenarios
test_cases = [
{ text: "ASCII only", pattern: /(\w+)\s+(\w+)/ },
{ text: "Mixed 文字 encoding", pattern: /(Mixed)\s+(文字)\s+(\w+)/ },
{ text: "Émojis 🚀 included", pattern: /(Émojis)\s+(🚀)\s+(\w+)/ }
]
test_cases.each do |test_case|
match = test_case[:pattern].match(test_case[:text])
next unless match
puts "Testing: #{test_case[:text]}"
(0...match.size).each do |i|
valid = validate_byte_extraction(test_case[:text], match, i)
puts " Group #{i}: #{valid ? 'VALID' : 'INVALID'}"
end
puts
end
Common Pitfalls
The most frequent mistake when using bytebegin
and byteend
involves confusing byte positions with character positions. This confusion becomes critical when working with multi-byte encoded strings where characters occupy more than one byte.
# Pitfall: Assuming bytes equal characters
problematic_text = "café résumé" # Contains accented characters
pattern = /(\w+)\s+(\w+)/
match = pattern.match(problematic_text)
# WRONG: Using character methods with byte operations
wrong_start = match.begin(1) # => 0 (character position)
wrong_length = match.end(1) - match.begin(1) # => 4 (character count)
wrong_extraction = problematic_text.byteslice(wrong_start, wrong_length)
# => "caf" (truncated because é is 2 bytes)
# CORRECT: Using byte methods consistently
correct_start = match.bytebegin(1) # => 0 (byte position)
correct_end = match.byteend(1) # => 5 (byte position)
correct_extraction = problematic_text.byteslice(correct_start, correct_end - correct_start)
# => "café" (complete word)
puts "Wrong extraction: #{wrong_extraction.inspect}" # => "caf"
puts "Correct extraction: #{correct_extraction.inspect}" # => "café"
Index validation represents another common source of errors. Developers often fail to verify that capture groups exist before accessing their byte positions, leading to runtime exceptions in production.
# Pitfall: Assuming all groups captured successfully
optional_pattern = /(required)(?:_(\w+))?(?:\.(\w+))?/
test_inputs = [
"required_optional.ext", # All groups match
"required_optional", # Group 3 is nil
"required.ext", # Group 2 is nil
"required" # Groups 2 and 3 are nil
]
# WRONG: Not checking for nil values
test_inputs.each do |input|
match = optional_pattern.match(input)
# This will raise errors or produce unexpected results
begin
part1_start = match.bytebegin(1)
part2_start = match.bytebegin(2) # May be nil
part3_start = match.bytebegin(3) # May be nil
# Operations on nil will fail
puts "Input: #{input}"
puts " Part 1 at: #{part1_start}"
puts " Part 2 at: #{part2_start}" # nil.inspect => "nil"
puts " Part 3 at: #{part3_start}" # nil.inspect => "nil"
rescue => e
puts "Error processing #{input}: #{e.message}"
end
end
# CORRECT: Proper nil handling
test_inputs.each do |input|
match = optional_pattern.match(input)
puts "Input: #{input}"
puts " Part 1: #{match.bytebegin(1) || 'not matched'}"
puts " Part 2: #{match.bytebegin(2) || 'not matched'}"
puts " Part 3: #{match.bytebegin(3) || 'not matched'}"
end
String mutation during processing creates subtle bugs where byte positions become invalid after modifications. This problem occurs frequently in text processing pipelines.
# Pitfall: Modifying strings while using stored byte positions
text = "Replace FOO with bar and FOO with baz"
pattern = /FOO/
matches = []
# WRONG: Collecting positions then modifying string
text.scan(pattern) { matches << Regexp.last_match.dup }
matches.each do |match|
start_pos = match.bytebegin(0)
end_pos = match.byteend(0)
# String modification invalidates subsequent positions
text[start_pos...end_pos] = "REPLACED"
end
# Result is unpredictable due to position drift
# CORRECT: Process matches in reverse order
text = "Replace FOO with bar and FOO with baz" # Reset
matches = []
text.scan(pattern) { matches << Regexp.last_match.dup }
# Process from end to beginning to maintain position validity
matches.reverse.each_with_index do |match, index|
start_pos = match.bytebegin(0)
end_pos = match.byteend(0)
replacement = index.zero? ? "baz" : "bar"
text[start_pos...end_pos] = replacement
end
puts text # => "Replace bar with bar and baz with baz"
Encoding mismatch issues arise when working with strings in different encodings. The byte methods always work with the string's internal byte representation, which may not match expectations.
# Pitfall: Encoding assumptions
utf8_string = "Testing 测试"
ascii_pattern = /(\w+)\s+(.+)/
# Force different encoding interpretation
ascii_string = utf8_string.dup.force_encoding('ASCII-8BIT')
match_utf8 = ascii_pattern.match(utf8_string)
match_ascii = ascii_pattern.match(ascii_string)
puts "UTF-8 string: #{utf8_string.inspect}"
puts "ASCII string: #{ascii_string.inspect}"
# Byte positions may be the same, but extraction differs
if match_utf8 && match_ascii
utf8_extraction = utf8_string.byteslice(match_utf8.bytebegin(2),
match_utf8.byteend(2) - match_utf8.bytebegin(2))
ascii_extraction = ascii_string.byteslice(match_ascii.bytebegin(2),
match_ascii.byteend(2) - match_ascii.bytebegin(2))
puts "UTF-8 extraction: #{utf8_extraction.inspect}"
puts "ASCII extraction: #{ascii_extraction.inspect}"
# Different results due to encoding interpretation
end
Named capture confusion occurs when developers use inconsistent naming conventions or attempt to access groups that don't exist in the pattern.
# Pitfall: Inconsistent named group access
pattern = /(?<user_name>\w+)@(?<domain_name>[\w.-]+)/
email = "user@example.com"
match = pattern.match(email)
# WRONG: Inconsistent naming conventions
begin
user = match.bytebegin(:user) # Error: group is :user_name
domain = match.bytebegin("domain") # Error: group is :domain_name
rescue IndexError => e
puts "Named group error: #{e.message}"
end
# CORRECT: Use exact group names from pattern
user = match.bytebegin(:user_name) # Works correctly
domain = match.bytebegin(:domain_name) # Works correctly
# Helper method to avoid naming confusion
def extract_named_groups_bytes(match)
result = {}
match.names.each do |name|
start_pos = match.bytebegin(name)
end_pos = match.byteend(name)
next if start_pos.nil?
result[name] = match.string.byteslice(start_pos, end_pos - start_pos)
end
result
end
extracted = extract_named_groups_bytes(match)
puts extracted # => {"user_name"=>"user", "domain_name"=>"example.com"}
Performance assumptions about byte operations can lead to inefficient code patterns. While byte operations are generally faster than character operations, repeated calls within loops can create bottlenecks.
# Pitfall: Inefficient repeated byte position calls
large_text = "word " * 10000 # Large text
pattern = /(\w+)/
matches = []
large_text.scan(pattern) { matches << Regexp.last_match.dup }
# WRONG: Repeated method calls in loop
start_time = Time.now
results = matches.map do |match|
# Multiple method calls per iteration
start_pos = match.bytebegin(0)
end_pos = match.byteend(0)
length = end_pos - start_pos
{
content: large_text.byteslice(start_pos, length),
start: start_pos,
end: end_pos,
length: length
}
end
slow_time = Time.now - start_time
# BETTER: Minimize method calls
start_time = Time.now
results = matches.map do |match|
start_pos = match.bytebegin(0)
end_pos = match.byteend(0)
content = large_text.byteslice(start_pos, end_pos - start_pos)
{
content: content,
start: start_pos,
end: end_pos,
length: content.bytesize # Avoid subtraction
}
end
fast_time = Time.now - start_time
puts "Slow approach: #{slow_time.round(4)} seconds"
puts "Fast approach: #{fast_time.round(4)} seconds"
puts "Improvement: #{((slow_time - fast_time) / slow_time * 100).round(1)}%"
Reference
Method Signatures
Method | Parameters | Returns | Description |
---|---|---|---|
bytebegin(n) |
n (Integer) |
Integer or nil |
Returns starting byte offset of nth capture group |
bytebegin(name) |
name (String/Symbol) |
Integer or nil |
Returns starting byte offset of named capture group |
byteend(n) |
n (Integer) |
Integer or nil |
Returns ending byte offset of nth capture group |
byteend(name) |
name (String/Symbol) |
Integer or nil |
Returns ending byte offset of named capture group |
Parameter Details
Numeric Parameters:
0
: Full match (entire matched string)1
,2
,3
, ...: Capture groups in order of appearance- Negative values: Not supported (raises IndexError)
Named Parameters:
String
: Named capture group identifier (case-sensitive)Symbol
: Named capture group identifier (case-sensitive)- Must exactly match names defined in regex pattern
Return Values
Success Cases:
Integer
: Valid byte offset within original string- Range:
0
tostring.bytesize
bytebegin
: Inclusive start positionbyteend
: Exclusive end position
Failure Cases:
nil
: Group exists but didn't match (optional groups)IndexError
: Invalid group index or unknown group name
Exception Hierarchy
IndexError
├── "index N out of matches" (invalid numeric group)
└── "undefined group name reference: NAME" (invalid named group)
Related Methods Comparison
Method | Position Type | Return Value | Use Case |
---|---|---|---|
begin(n) |
Character offset | Integer /nil |
Text processing with character granularity |
end(n) |
Character offset | Integer /nil |
Text processing with character granularity |
bytebegin(n) |
Byte offset | Integer /nil |
Binary data, encoding-aware processing |
byteend(n) |
Byte offset | Integer /nil |
Binary data, encoding-aware processing |
byteoffset(n) |
Byte range | [Integer, Integer] |
Combined start/end byte positions |
offset(n) |
Character range | [Integer, Integer] |
Combined start/end character positions |
Common Patterns
Basic Extraction:
start_pos = match.bytebegin(group)
end_pos = match.byteend(group)
content = string.byteslice(start_pos, end_pos - start_pos)
Safe Extraction with Validation:
start_pos = match.bytebegin(group)
end_pos = match.byteend(group)
content = (start_pos && end_pos) ? string.byteslice(start_pos, end_pos - start_pos) : nil
Range-based Operations:
byte_range = match.bytebegin(group)...match.byteend(group)
content = string.byteslice(byte_range)
Performance Characteristics
Time Complexity:
O(1)
for byte position retrievalO(n)
for character position retrieval (where n = string length)
Memory Usage:
- No additional memory allocation
- Returns primitive integer values
Encoding Impact:
- Multi-byte encodings: Byte methods are faster
- ASCII-only strings: Minimal performance difference
- Binary data: Byte methods are required
Compatibility Notes
Ruby Versions:
- Available: Ruby 3.4+
- Feature Request: #20576
- Not available in earlier Ruby versions
Encoding Support:
- All string encodings supported
- Binary data compatible (ASCII-8BIT encoding)
- UTF-8, UTF-16, UTF-32 fully supported
- Position values always in bytes regardless of encoding
Platform Differences:
- Consistent behavior across platforms
- No platform-specific variations
- Thread-safe operations