CrackedRuby logo

CrackedRuby

Transformation Methods

Ruby transformation methods convert data from one form to another across strings, arrays, hashes, and other objects using built-in methods and custom transformation logic.

Core Modules Enumerable Module
3.2.3

Overview

Ruby provides transformation methods across multiple core classes to convert data between different formats, encodings, structures, and representations. The standard library includes string transformation methods like String#encode, String#tr, and String#gsub, array transformation methods like Array#map and Array#collect, and hash transformation methods like Hash#transform_keys and Hash#transform_values.

String transformations handle character encoding conversion, case changes, pattern replacement, and character substitution. The String#encode method converts between character encodings, while String#tr performs character-by-character substitution. Pattern-based transformations use String#gsub and String#sub for regular expression matching and replacement.

# Basic string transformations
text = "Hello World"
text.upcase              # => "HELLO WORLD"
text.tr('l', 'x')       # => "Hexxo Worxd"
text.gsub(/o/, 'O')     # => "HellO WOrld"

Array transformations convert elements from one form to another through iteration methods. The Array#map method applies a transformation block to each element, returning a new array with transformed values. Hash transformations operate on keys, values, or both simultaneously.

# Array and hash transformations
numbers = [1, 2, 3, 4]
numbers.map { |n| n * 2 }           # => [2, 4, 6, 8]

hash = { name: "john", age: 30 }
hash.transform_keys(&:to_s)         # => {"name"=>"john", "age"=>30}
hash.transform_values { |v| v.to_s } # => {:name=>"john", :age=>"30"}

Ruby's transformation methods follow consistent patterns across different data types, accepting blocks for custom transformation logic and supporting method chaining for complex data processing pipelines. The methods preserve immutability by default, returning new objects rather than modifying existing ones.

Basic Usage

String encoding transformations convert text between different character encodings using the String#encode method. Ruby handles encoding conversion automatically in most cases, but explicit conversion becomes necessary when working with different file systems, databases, or network protocols.

# Encoding transformations
utf8_text = "café"
ascii_text = utf8_text.encode('ASCII', undef: :replace, replace: '?')
# => "caf?"

# Round-trip encoding
original = "naïve résumé"
latin1 = original.encode('ISO-8859-1')
back_to_utf8 = latin1.encode('UTF-8')
# => "naïve résumé"

Character substitution uses String#tr for simple character-by-character replacement and String#gsub for pattern-based replacement. The tr method accepts character ranges and sets, making it efficient for case conversion and character filtering.

# Character substitution
phone = "(555) 123-4567"
digits_only = phone.tr('^0-9', '')  # => "5551234567"

# Case conversion with tr
mixed_case = "Hello World"
custom_case = mixed_case.tr('a-z', 'A-Z')  # => "HELLO WORLD"

# Pattern replacement
email = "user@DOMAIN.COM"
normalized = email.gsub(/[A-Z]/, &:downcase)  # => "user@domain.com"

Array transformations apply blocks to each element, creating new arrays with transformed values. The map method handles the most common transformation patterns, while filter_map combines filtering and transformation in a single operation.

# Numeric transformations
prices = [10.99, 25.50, 8.75]
formatted_prices = prices.map { |price| "$#{sprintf('%.2f', price)}" }
# => ["$10.99", "$25.50", "$8.75"]

# String transformations with filtering
words = ["apple", "banana", "", "cherry", nil]
clean_words = words.filter_map { |w| w&.upcase if w && !w.empty? }
# => ["APPLE", "BANANA", "CHERRY"]

Hash transformations modify keys, values, or both through dedicated methods. The transform_keys and transform_values methods accept blocks for custom transformation logic, while transform_keys! and transform_values! modify the hash in place.

# Key transformations
user_data = { 'first_name' => 'John', 'last_name' => 'Doe', 'email' => 'john@example.com' }
symbolized = user_data.transform_keys(&:to_sym)
# => {:first_name=>"John", :last_name=>"Doe", :email=>"john@example.com"}

# Value transformations
form_data = { name: 'John Doe', age: '30', active: 'true' }
processed = form_data.transform_values do |value|
  case value
  when /^\d+$/ then value.to_i
  when 'true' then true
  when 'false' then false
  else value
  end
end
# => {:name=>"John Doe", :age=>30, :active=>true}

Advanced Usage

Complex transformation pipelines combine multiple methods through chaining, enabling sophisticated data processing workflows. Ruby's enumerable methods support lazy evaluation for memory-efficient processing of large datasets.

# Chained transformations with lazy evaluation
data = (1..1000000).lazy
  .map { |n| n.to_s }
  .select { |s| s.include?('5') }
  .map { |s| s.upcase }
  .take(10)
  .to_a
# => ["5", "15", "25", "35", "45", "50", "51", "52", "53", "54"]

Custom transformation classes encapsulate complex transformation logic and provide reusable transformation patterns. These classes implement call methods or define transformation-specific interfaces.

class DataNormalizer
  def initialize(options = {})
    @strip_whitespace = options.fetch(:strip_whitespace, true)
    @downcase = options.fetch(:downcase, false)
    @remove_special = options.fetch(:remove_special, false)
  end

  def call(data)
    case data
    when String
      transform_string(data)
    when Array
      data.map { |item| call(item) }
    when Hash
      data.transform_values { |value| call(value) }
    else
      data
    end
  end

  private

  def transform_string(string)
    result = string.dup
    result = result.strip if @strip_whitespace
    result = result.downcase if @downcase
    result = result.gsub(/[^a-zA-Z0-9\s]/, '') if @remove_special
    result
  end
end

normalizer = DataNormalizer.new(strip_whitespace: true, downcase: true)
messy_data = { name: "  JOHN DOE  ", email: "JOHN@EXAMPLE.COM  " }
clean_data = normalizer.call(messy_data)
# => {:name=>"john doe", :email=>"john@example.com"}

Metaprogramming techniques create dynamic transformation methods that generate transformation logic based on configuration or runtime conditions. These approaches reduce code duplication and increase flexibility.

class TransformationBuilder
  def self.build_transformer(transformations)
    Class.new do
      transformations.each do |field, transform_proc|
        define_method("transform_#{field}") do |value|
          transform_proc.call(value)
        end
      end

      define_method(:transform) do |data|
        result = {}
        data.each do |key, value|
          transform_method = "transform_#{key}"
          result[key] = respond_to?(transform_method) ? 
            send(transform_method, value) : value
        end
        result
      end
    end.new
  end
end

transformer = TransformationBuilder.build_transformer(
  name: ->(v) { v.to_s.strip.titleize },
  age: ->(v) { v.to_i },
  email: ->(v) { v.to_s.downcase.strip }
)

raw_data = { name: "  john doe  ", age: "30", email: "  JOHN@EXAMPLE.COM  " }
transformed = transformer.transform(raw_data)
# => {:name=>"John Doe", :age=>30, :email=>"john@example.com"}

Functional composition creates transformation pipelines using callable objects and method references. This approach separates transformation logic from data structures and enables reusable transformation components.

# Functional transformation composition
TRANSFORMATIONS = {
  strip: ->(s) { s.strip },
  downcase: ->(s) { s.downcase },
  remove_digits: ->(s) { s.gsub(/\d/, '') },
  normalize_whitespace: ->(s) { s.gsub(/\s+/, ' ') }
}.freeze

def compose(*transformations)
  ->(input) { transformations.reduce(input) { |acc, transform| transform.call(acc) } }
end

text_normalizer = compose(
  TRANSFORMATIONS[:strip],
  TRANSFORMATIONS[:downcase],
  TRANSFORMATIONS[:remove_digits],
  TRANSFORMATIONS[:normalize_whitespace]
)

messy_text = "  HELLO    WORLD 123   "
clean_text = text_normalizer.call(messy_text)
# => "hello world"

Performance & Memory

Large-scale transformations require consideration of memory usage and processing time. Ruby's enumerable methods provide lazy evaluation options that process data incrementally rather than loading entire datasets into memory.

# Memory-efficient file processing
def process_large_file(filename)
  File.foreach(filename).lazy
    .map(&:chomp)
    .reject(&:empty?)
    .map { |line| line.split(',') }
    .select { |fields| fields.length > 3 }
    .map { |fields| { id: fields[0], name: fields[1], email: fields[2] } }
    .each_slice(1000)
    .each { |batch| process_batch(batch) }
end

String transformations benefit from using bang methods when modifying strings in place versus creating new string objects. However, bang methods modify the original string, which may not be appropriate in all contexts.

require 'benchmark'

# Performance comparison: bang vs non-bang methods
text = "hello world" * 1000

Benchmark.bm do |x|
  x.report("upcase!") do 
    1000.times { text.dup.upcase! }
  end
  
  x.report("upcase") do
    1000.times { text.upcase }
  end
end

# String encoding performance
large_text = "café" * 10000

Benchmark.bm do |x|
  x.report("encode") do
    100.times { large_text.encode('ISO-8859-1') }
  end
  
  x.report("encode with options") do
    100.times { large_text.encode('ISO-8859-1', undef: :replace) }
  end
end

Hash and array transformations show different performance characteristics depending on the size of the data and the complexity of the transformation logic. Pre-allocated data structures can improve performance for predictable transformation patterns.

# Pre-allocation for better performance
def transform_with_preallocation(data, size_hint = nil)
  result = size_hint ? Array.new(size_hint) : []
  
  data.each_with_index do |item, index|
    transformed = expensive_transformation(item)
    if size_hint
      result[index] = transformed
    else
      result << transformed
    end
  end
  
  result
end

# Comparison of transformation approaches
large_array = (1..100000).to_a

Benchmark.bm do |x|
  x.report("map") do
    large_array.map { |n| n * 2 }
  end
  
  x.report("each_with_object") do
    large_array.each_with_object([]) { |n, acc| acc << n * 2 }
  end
  
  x.report("preallocation") do
    result = Array.new(large_array.size)
    large_array.each_with_index { |n, i| result[i] = n * 2 }
    result
  end
end

Memory profiling helps identify bottlenecks in transformation pipelines. Ruby provides memory profiling tools that track object allocation and garbage collection patterns during transformation operations.

require 'memory_profiler'

# Memory profiling for transformations
data = Array.new(10000) { { name: "user#{rand(1000)}", age: rand(100) } }

report = MemoryProfiler.report do
  transformed = data.map do |record|
    {
      display_name: record[:name].upcase,
      age_group: case record[:age]
                 when 0..18 then 'minor'
                 when 19..64 then 'adult'
                 else 'senior'
                 end
    }
  end
end

puts report.pretty_print

Error Handling & Debugging

Encoding transformations can fail when converting between incompatible character sets or when encountering invalid byte sequences. Ruby provides several strategies for handling encoding errors gracefully.

# Robust encoding transformation
def safe_encode(text, target_encoding, fallback_encoding = 'UTF-8')
  text.encode(target_encoding)
rescue Encoding::UndefinedConversionError => e
  puts "Conversion error: #{e.message}"
  text.encode(target_encoding, undef: :replace, replace: '?')
rescue Encoding::InvalidByteSequenceError => e
  puts "Invalid byte sequence: #{e.message}"
  text.encode(fallback_encoding, invalid: :replace, replace: '?')
    .encode(target_encoding, undef: :replace, replace: '?')
end

# Example usage with problematic input
corrupted_text = "caf\xE9".force_encoding('UTF-8')  # Invalid UTF-8
safe_result = safe_encode(corrupted_text, 'ASCII')
# => Conversion error: U+00E9 from UTF-8 to US-ASCII
# => "caf?"

Pattern matching errors in regular expression transformations require validation of both the pattern and the input data. Complex patterns can cause performance issues or unexpected matches.

def safe_gsub(text, pattern, replacement, max_replacements: Float::INFINITY)
  return text unless text.is_a?(String)
  
  # Validate pattern
  unless pattern.is_a?(Regexp) || pattern.is_a?(String)
    raise ArgumentError, "Pattern must be a Regexp or String"
  end
  
  replacement_count = 0
  result = text.gsub(pattern) do |match|
    replacement_count += 1
    if replacement_count > max_replacements
      break match  # Stop replacing after limit
    end
    
    case replacement
    when String then replacement
    when Proc then replacement.call(match)
    else replacement.to_s
    end
  end
  
  { result: result, replacements: replacement_count }
rescue RegexpError => e
  raise ArgumentError, "Invalid regular expression: #{e.message}"
end

# Usage with error handling
begin
  result = safe_gsub("hello world", /l+/, "X", max_replacements: 2)
  puts result  # => {:result=>"heXXo world", :replacements=>2}
rescue ArgumentError => e
  puts "Transformation error: #{e.message}"
end

Collection transformation errors occur when transformation blocks raise exceptions or return unexpected values. Implementing error recovery strategies ensures partial results rather than complete failure.

def robust_map(collection, &block)
  return enum_for(:robust_map) unless block_given?
  
  results = []
  errors = []
  
  collection.each_with_index do |item, index|
    begin
      transformed = block.call(item)
      results << { index: index, value: transformed, error: nil }
    rescue StandardError => e
      results << { index: index, value: nil, error: e }
      errors << { index: index, item: item, error: e }
    end
  end
  
  OpenStruct.new(
    results: results.map { |r| r[:value] },
    errors: errors,
    success_count: results.count { |r| r[:error].nil? },
    error_count: errors.length
  )
end

# Example with mixed success/failure
mixed_data = [1, "2", 3, nil, "5", {}]
result = robust_map(mixed_data) { |x| Integer(x) * 2 }

puts "Successful transformations: #{result.success_count}"
puts "Errors: #{result.error_count}"
puts "Results: #{result.results.compact}"
# => Successful transformations: 3
# => Errors: 3  
# => Results: [2, 4, 6]

Debugging transformation pipelines requires visibility into intermediate steps and data flow. Custom debugging utilities can track transformations through complex processing chains.

class DebuggingTransformer
  def initialize(debug: false)
    @debug = debug
    @steps = []
  end
  
  def transform(data, name = nil, &block)
    input_info = debug_info(data)
    log_step("INPUT", name, input_info) if @debug
    
    result = block.call(data)
    
    output_info = debug_info(result)
    log_step("OUTPUT", name, output_info) if @debug
    
    @steps << { name: name, input: input_info, output: output_info }
    result
  rescue StandardError => e
    error_info = { error: e.class.name, message: e.message }
    log_step("ERROR", name, error_info) if @debug
    @steps << { name: name, input: input_info, error: error_info }
    raise
  end
  
  def summary
    @steps.map do |step|
      if step[:error]
        "#{step[:name]}: ERROR - #{step[:error][:message]}"
      else
        "#{step[:name]}: #{step[:input][:type]}(#{step[:input][:size]}) -> #{step[:output][:type]}(#{step[:output][:size]})"
      end
    end.join("\n")
  end
  
  private
  
  def debug_info(data)
    {
      type: data.class.name,
      size: data.respond_to?(:size) ? data.size : 1,
      sample: data.respond_to?(:first) ? data.first : data
    }
  end
  
  def log_step(type, name, info)
    puts "[DEBUG] #{type} #{name}: #{info}"
  end
end

# Usage example
transformer = DebuggingTransformer.new(debug: true)
data = ["  HELLO  ", "  WORLD  ", "  "]

result = transformer.transform(data, "strip") { |arr| arr.map(&:strip) }
result = transformer.transform(result, "filter_empty") { |arr| arr.reject(&:empty?) }
result = transformer.transform(result, "downcase") { |arr| arr.map(&:downcase) }

puts transformer.summary
# => strip: Array(3) -> Array(3)
# => filter_empty: Array(3) -> Array(2)  
# => downcase: Array(2) -> Array(2)

Reference

String Transformation Methods

Method Parameters Returns Description
String#encode(encoding, **opts) encoding (String/Encoding), options (Hash) String Converts string to specified encoding
String#tr(from_str, to_str) from_str (String), to_str (String) String Translates characters using character sets
String#gsub(pattern, replacement) pattern (Regexp/String), replacement (String/Proc) String Replaces all pattern matches
String#sub(pattern, replacement) pattern (Regexp/String), replacement (String/Proc) String Replaces first pattern match
String#upcase(**opts) options (Hash) String Converts to uppercase
String#downcase(**opts) options (Hash) String Converts to lowercase
String#swapcase none String Swaps case of each character
String#capitalize(**opts) options (Hash) String Capitalizes first character

Array Transformation Methods

Method Parameters Returns Description
Array#map(&block) block (Proc) Array Transforms each element via block
Array#collect(&block) block (Proc) Array Alias for map
Array#filter_map(&block) block (Proc) Array Maps and filters nil values
Array#flat_map(&block) block (Proc) Array Maps and flattens results
Array#transform_values(&block) block (Proc) Array Ruby 2.7+ method for value transformation

Hash Transformation Methods

Method Parameters Returns Description
Hash#transform_keys(&block) block (Proc) Hash Transforms all keys via block
Hash#transform_values(&block) block (Proc) Hash Transforms all values via block
Hash#transform_keys!(&block) block (Proc) Hash Transforms keys in place
Hash#transform_values!(&block) block (Proc) Hash Transforms values in place

Encoding Options

Option Values Description
:invalid :replace, :undef Handling for invalid byte sequences
:undef :replace Handling for undefined conversions
:replace String Replacement character for invalid/undefined
:newline :universal, :crlf, :cr Newline conversion mode
:xml :text, :attr XML-safe encoding mode

Common Character Sets for String#tr

Set Description Example
a-z Lowercase letters text.tr('a-z', 'A-Z')
A-Z Uppercase letters text.tr('A-Z', 'a-z')
0-9 Digits text.tr('0-9', '')
^0-9 Non-digits (negated) text.tr('^0-9', '')
\s Whitespace text.tr('\s', '_')

Transformation Error Types

Exception Cause Recovery Strategy
Encoding::UndefinedConversionError Character cannot be represented in target encoding Use :undef => :replace option
Encoding::InvalidByteSequenceError Invalid byte sequence in source encoding Use :invalid => :replace option
RegexpError Invalid regular expression pattern Validate pattern before use
ArgumentError Invalid transformation parameters Validate inputs and provide defaults
NoMethodError Transformation method not available Check object type before transformation

Performance Characteristics

Operation Time Complexity Memory Usage Notes
String#encode O(n) O(n) Creates new string object
String#tr O(n) O(n) Character-by-character processing
String#gsub O(n*m) O(n) Where m is pattern complexity
Array#map O(n) O(n) Creates new array
Hash#transform_keys O(n) O(n) Creates new hash
Lazy evaluation O(1) initial O(k) Where k is taken elements