CrackedRuby logo

CrackedRuby

Core Class Extensions

Ruby String core class extensions provide essential methods for text manipulation, encoding conversion, and pattern matching operations.

Metaprogramming Monkey Patching
5.9.1

Overview

Ruby extends the String class with numerous methods that handle text processing, encoding transformations, and pattern operations. These extensions form the backbone of text manipulation in Ruby applications, providing methods for case conversion, substring extraction, pattern matching, and character encoding operations.

The String class includes methods for modifying content (#gsub, #tr, #squeeze), extracting information (#scan, #match, #include?), and transforming format (#upcase, #downcase, #capitalize). Ruby handles string encoding through methods like #encode, #force_encoding, and #valid_encoding?, supporting multiple character encodings including UTF-8, ASCII, and ISO-8859-1.

text = "Hello, World!"
text.upcase                    # => "HELLO, WORLD!"
text.gsub(/[aeiou]/, '*')      # => "H*ll*, W*rld!"
text.include?("World")         # => true

String interpolation works seamlessly with these methods:

name = "ruby developer"
puts "Welcome #{name.titleize}!"  # => "Welcome Ruby Developer!"

The encoding system handles character conversion transparently:

utf8_string = "café".encode('UTF-8')
ascii_string = utf8_string.encode('ASCII', invalid: :replace)
# => "caf?"

Basic Usage

String case conversion methods transform text between different capitalization formats. Ruby provides #upcase, #downcase, #capitalize, #swapcase, and locale-aware variants.

text = "Mixed Case String"
text.upcase        # => "MIXED CASE STRING"
text.downcase      # => "mixed case string"  
text.capitalize    # => "Mixed case string"
text.swapcase      # => "mIXED cASE sTRING"

The #gsub method performs pattern-based substitutions using regular expressions or strings. The method accepts blocks for complex replacement logic.

text = "The quick brown fox jumps"
text.gsub(/\b\w{4}\b/, '[WORD]')           # => "The [WORD] brown fox [WORD]"
text.gsub(/(\w+)/) { |word| word.reverse }  # => "ehT kciuq nworb xof spmuj"

Character translation occurs through #tr and #tr_s methods. These methods map character sets to replacement characters.

"hello world".tr('l', 'x')      # => "hexxo worxd"
"hello world".tr('a-z', 'A-Z')  # => "HELLO WORLD"
"bookkeeper".tr_s('k', 'c')     # => "booceeper"

String scanning with #scan extracts matching patterns into arrays. The method works with regular expressions and string patterns.

text = "Phone: 555-1234, Fax: 555-5678"
text.scan(/\d{3}-\d{4}/)        # => ["555-1234", "555-5678"]
text.scan(/(\w+):\s*(\S+)/)     # => [["Phone", "555-1234"], ["Fax", "555-5678"]]

Advanced Usage

String extensions support complex text processing through method chaining and block-based transformations. The #gsub method accepts advanced regular expression patterns with named captures and lookarounds.

html_text = "<p>Hello <strong>world</strong>!</p>"
clean_text = html_text
  .gsub(/<[^>]+>/, '')           # Remove HTML tags
  .squeeze(' ')                   # Collapse multiple spaces
  .strip                         # Remove leading/trailing whitespace
# => "Hello world!"

Pattern extraction becomes sophisticated with named captures and complex regular expressions:

log_entry = "2024-01-15 14:30:25 ERROR DatabaseConnection timeout after 30s"
pattern = /(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})\s+(?<level>\w+)\s+(?<message>.*)/

match = log_entry.match(pattern)
{
  timestamp: "#{match[:date]} #{match[:time]}",
  severity: match[:level],
  details: match[:message]
}
# => {:timestamp=>"2024-01-15 14:30:25", :severity=>"ERROR", :details=>"DatabaseConnection timeout after 30s"}

The #partition and #rpartition methods split strings around delimiters, returning three-element arrays containing the parts before, including, and after the delimiter.

email = "user@example.com"
username, at_sign, domain = email.partition('@')
# username => "user", at_sign => "@", domain => "example.com"

filepath = "/home/user/documents/file.txt"
directory, separator, filename = filepath.rpartition('/')
# directory => "/home/user/documents", separator => "/", filename => "file.txt"

String encoding conversion handles character set transformations with error handling strategies. The #encode method accepts replacement characters and invalid byte handling.

mixed_encoding = "Café naïve résumé".encode('UTF-8')

# Convert with replacement characters
ascii_version = mixed_encoding.encode('ASCII', 
  invalid: :replace, 
  undef: :replace, 
  replace: '?')
# => "Caf? na?ve r?sum?"

# Convert with XML entity encoding  
xml_safe = mixed_encoding.encode('ASCII', 
  invalid: :replace, 
  undef: :replace, 
  replace: proc { |char| "&##{char.ord};" })
# => "Caf&#233; na&#239;ve r&#233;sum&#233;"

Common Pitfalls

String mutating methods create confusion between destructive and non-destructive operations. Methods ending with exclamation marks modify the original string, while others return new strings.

original = "hello world"
result = original.upcase    # original unchanged, result = "HELLO WORLD"  
original.upcase!           # original modified to "HELLO WORLD"

# Common mistake: expecting mutation
text = "sample text"
text.gsub(/\s+/, '_')      # text still equals "sample text"
text = text.gsub(/\s+/, '_')  # Correct: reassign result

Regular expression escaping causes problems when user input contains special characters. The Regexp.escape method handles special character escaping.

user_input = "What is $5.00 + $3.50?"

# Wrong: treats $ and + as regex metacharacters
text.gsub(/#{user_input}/, 'REDACTED')  # Syntax error

# Correct: escape special characters  
text.gsub(/#{Regexp.escape(user_input)}/, 'REDACTED')

Encoding issues arise when mixing strings with different encodings or reading files without specifying encoding. Ruby raises Encoding::CompatibilityError for incompatible operations.

utf8_string = "résumé".encode('UTF-8')
ascii_string = "hello".encode('ASCII')

# This raises Encoding::CompatibilityError
begin
  result = utf8_string + ascii_string.force_encoding('UTF-8')  
rescue Encoding::CompatibilityError
  # Handle encoding mismatch
  result = utf8_string + ascii_string.encode('UTF-8')
end

Case conversion with international characters requires locale-aware methods. Standard case methods may not handle accented characters correctly.

turkish_text = "İstanbul"
turkish_text.downcase        # => "i̇stanbul" (incorrect for Turkish)

# Use locale-aware conversion when available
require 'unicode'
Unicode.downcase(turkish_text, :tr)  # Correct Turkish lowercase

The #tr method performs character-by-character replacement, not substring replacement. This creates unexpected results when replacing multi-character sequences.

text = "hello"
text.tr('ll', 'x')     # => "hexo" (each 'l' becomes 'x')
text.gsub('ll', 'x')   # => "hexo" (substring 'll' becomes 'x')

# For multi-character replacement, use gsub
"bookkeeper".tr('kk', 'c')    # => "booceeper" (each 'k' becomes 'c')  
"bookkeeper".gsub('kk', 'c')  # => "booceeper" (substring 'kk' becomes 'c')

Reference

Case Conversion Methods

Method Parameters Returns Description
#upcase None String Returns uppercase copy
#upcase! None String/nil Modifies string to uppercase
#downcase None String Returns lowercase copy
#downcase! None String/nil Modifies string to lowercase
#capitalize None String Returns copy with first character uppercase
#capitalize! None String/nil Modifies string capitalizing first character
#swapcase None String Returns copy with case swapped
#swapcase! None String/nil Modifies string swapping case

Pattern Matching and Substitution

Method Parameters Returns Description
#gsub(pattern, replacement) pattern (Regexp/String), replacement (String/Hash) String Returns copy with pattern replaced
#gsub!(pattern, replacement) pattern (Regexp/String), replacement (String/Hash) String/nil Modifies string replacing pattern
#sub(pattern, replacement) pattern (Regexp/String), replacement (String/Hash) String Returns copy with first pattern replaced
#sub!(pattern, replacement) pattern (Regexp/String), replacement (String/Hash) String/nil Modifies string replacing first pattern
#scan(pattern) pattern (Regexp/String) Array Returns array of pattern matches
#match(pattern, pos=0) pattern (Regexp), pos (Integer) MatchData/nil Returns match data or nil

Character Translation

Method Parameters Returns Description
#tr(from_str, to_str) from_str (String), to_str (String) String Returns copy with characters translated
#tr!(from_str, to_str) from_str (String), to_str (String) String/nil Modifies string translating characters
#tr_s(from_str, to_str) from_str (String), to_str (String) String Returns copy with characters translated and squeezed
#tr_s!(from_str, to_str) from_str (String), to_str (String) String/nil Modifies string translating and squeezing
#delete(other_str) other_str (String) String Returns copy with characters removed
#delete!(other_str) other_str (String) String/nil Modifies string removing characters
#squeeze(other_str=nil) other_str (String) String Returns copy with consecutive characters squeezed
#squeeze!(other_str=nil) other_str (String) String/nil Modifies string squeezing consecutive characters

String Splitting and Partitioning

Method Parameters Returns Description
#split(pattern=nil, limit=0) pattern (Regexp/String/nil), limit (Integer) Array Splits string into array
#partition(sep) sep (String/Regexp) Array Returns [before, separator, after]
#rpartition(sep) sep (String/Regexp) Array Returns [before, separator, after] from right
#lines(separator=$/) separator (String) Array Returns array of lines
#chars None Array Returns array of characters
#bytes None Array Returns array of byte values

Encoding Operations

Method Parameters Returns Description
#encode(encoding, **opts) encoding (String/Encoding), options (Hash) String Returns string in specified encoding
#encode!(encoding, **opts) encoding (String/Encoding), options (Hash) String Modifies string encoding
#force_encoding(encoding) encoding (String/Encoding) String Changes encoding without conversion
#encoding None Encoding Returns current encoding
#valid_encoding? None Boolean Checks if string has valid encoding
#ascii_only? None Boolean Checks if string contains only ASCII

Encoding Options

Option Values Description
:invalid :replace, :ignore How to handle invalid bytes
:undef :replace, :ignore How to handle undefined conversions
:replace String Replacement string for invalid/undefined
:fallback Hash/Proc Fallback for undefined characters
:xml :text, :attr XML entity conversion mode
:cr_newline Boolean Convert LF to CRLF
:crlf_newline Boolean Convert CRLF to LF
:universal_newline Boolean Convert various newlines to LF