Overview
Ruby provides transformation methods across multiple core classes to convert data between different formats, encodings, structures, and representations. The standard library includes string transformation methods like String#encode
, String#tr
, and String#gsub
, array transformation methods like Array#map
and Array#collect
, and hash transformation methods like Hash#transform_keys
and Hash#transform_values
.
String transformations handle character encoding conversion, case changes, pattern replacement, and character substitution. The String#encode
method converts between character encodings, while String#tr
performs character-by-character substitution. Pattern-based transformations use String#gsub
and String#sub
for regular expression matching and replacement.
# Basic string transformations
text = "Hello World"
text.upcase # => "HELLO WORLD"
text.tr('l', 'x') # => "Hexxo Worxd"
text.gsub(/o/, 'O') # => "HellO WOrld"
Array transformations convert elements from one form to another through iteration methods. The Array#map
method applies a transformation block to each element, returning a new array with transformed values. Hash transformations operate on keys, values, or both simultaneously.
# Array and hash transformations
numbers = [1, 2, 3, 4]
numbers.map { |n| n * 2 } # => [2, 4, 6, 8]
hash = { name: "john", age: 30 }
hash.transform_keys(&:to_s) # => {"name"=>"john", "age"=>30}
hash.transform_values { |v| v.to_s } # => {:name=>"john", :age=>"30"}
Ruby's transformation methods follow consistent patterns across different data types, accepting blocks for custom transformation logic and supporting method chaining for complex data processing pipelines. The methods preserve immutability by default, returning new objects rather than modifying existing ones.
Basic Usage
String encoding transformations convert text between different character encodings using the String#encode
method. Ruby handles encoding conversion automatically in most cases, but explicit conversion becomes necessary when working with different file systems, databases, or network protocols.
# Encoding transformations
utf8_text = "café"
ascii_text = utf8_text.encode('ASCII', undef: :replace, replace: '?')
# => "caf?"
# Round-trip encoding
original = "naïve résumé"
latin1 = original.encode('ISO-8859-1')
back_to_utf8 = latin1.encode('UTF-8')
# => "naïve résumé"
Character substitution uses String#tr
for simple character-by-character replacement and String#gsub
for pattern-based replacement. The tr
method accepts character ranges and sets, making it efficient for case conversion and character filtering.
# Character substitution
phone = "(555) 123-4567"
digits_only = phone.tr('^0-9', '') # => "5551234567"
# Case conversion with tr
mixed_case = "Hello World"
custom_case = mixed_case.tr('a-z', 'A-Z') # => "HELLO WORLD"
# Pattern replacement
email = "user@DOMAIN.COM"
normalized = email.gsub(/[A-Z]/, &:downcase) # => "user@domain.com"
Array transformations apply blocks to each element, creating new arrays with transformed values. The map
method handles the most common transformation patterns, while filter_map
combines filtering and transformation in a single operation.
# Numeric transformations
prices = [10.99, 25.50, 8.75]
formatted_prices = prices.map { |price| "$#{sprintf('%.2f', price)}" }
# => ["$10.99", "$25.50", "$8.75"]
# String transformations with filtering
words = ["apple", "banana", "", "cherry", nil]
clean_words = words.filter_map { |w| w&.upcase if w && !w.empty? }
# => ["APPLE", "BANANA", "CHERRY"]
Hash transformations modify keys, values, or both through dedicated methods. The transform_keys
and transform_values
methods accept blocks for custom transformation logic, while transform_keys!
and transform_values!
modify the hash in place.
# Key transformations
user_data = { 'first_name' => 'John', 'last_name' => 'Doe', 'email' => 'john@example.com' }
symbolized = user_data.transform_keys(&:to_sym)
# => {:first_name=>"John", :last_name=>"Doe", :email=>"john@example.com"}
# Value transformations
form_data = { name: 'John Doe', age: '30', active: 'true' }
processed = form_data.transform_values do |value|
case value
when /^\d+$/ then value.to_i
when 'true' then true
when 'false' then false
else value
end
end
# => {:name=>"John Doe", :age=>30, :active=>true}
Advanced Usage
Complex transformation pipelines combine multiple methods through chaining, enabling sophisticated data processing workflows. Ruby's enumerable methods support lazy evaluation for memory-efficient processing of large datasets.
# Chained transformations with lazy evaluation
data = (1..1000000).lazy
.map { |n| n.to_s }
.select { |s| s.include?('5') }
.map { |s| s.upcase }
.take(10)
.to_a
# => ["5", "15", "25", "35", "45", "50", "51", "52", "53", "54"]
Custom transformation classes encapsulate complex transformation logic and provide reusable transformation patterns. These classes implement call methods or define transformation-specific interfaces.
class DataNormalizer
def initialize(options = {})
@strip_whitespace = options.fetch(:strip_whitespace, true)
@downcase = options.fetch(:downcase, false)
@remove_special = options.fetch(:remove_special, false)
end
def call(data)
case data
when String
transform_string(data)
when Array
data.map { |item| call(item) }
when Hash
data.transform_values { |value| call(value) }
else
data
end
end
private
def transform_string(string)
result = string.dup
result = result.strip if @strip_whitespace
result = result.downcase if @downcase
result = result.gsub(/[^a-zA-Z0-9\s]/, '') if @remove_special
result
end
end
normalizer = DataNormalizer.new(strip_whitespace: true, downcase: true)
messy_data = { name: " JOHN DOE ", email: "JOHN@EXAMPLE.COM " }
clean_data = normalizer.call(messy_data)
# => {:name=>"john doe", :email=>"john@example.com"}
Metaprogramming techniques create dynamic transformation methods that generate transformation logic based on configuration or runtime conditions. These approaches reduce code duplication and increase flexibility.
class TransformationBuilder
def self.build_transformer(transformations)
Class.new do
transformations.each do |field, transform_proc|
define_method("transform_#{field}") do |value|
transform_proc.call(value)
end
end
define_method(:transform) do |data|
result = {}
data.each do |key, value|
transform_method = "transform_#{key}"
result[key] = respond_to?(transform_method) ?
send(transform_method, value) : value
end
result
end
end.new
end
end
transformer = TransformationBuilder.build_transformer(
name: ->(v) { v.to_s.strip.titleize },
age: ->(v) { v.to_i },
email: ->(v) { v.to_s.downcase.strip }
)
raw_data = { name: " john doe ", age: "30", email: " JOHN@EXAMPLE.COM " }
transformed = transformer.transform(raw_data)
# => {:name=>"John Doe", :age=>30, :email=>"john@example.com"}
Functional composition creates transformation pipelines using callable objects and method references. This approach separates transformation logic from data structures and enables reusable transformation components.
# Functional transformation composition
TRANSFORMATIONS = {
strip: ->(s) { s.strip },
downcase: ->(s) { s.downcase },
remove_digits: ->(s) { s.gsub(/\d/, '') },
normalize_whitespace: ->(s) { s.gsub(/\s+/, ' ') }
}.freeze
def compose(*transformations)
->(input) { transformations.reduce(input) { |acc, transform| transform.call(acc) } }
end
text_normalizer = compose(
TRANSFORMATIONS[:strip],
TRANSFORMATIONS[:downcase],
TRANSFORMATIONS[:remove_digits],
TRANSFORMATIONS[:normalize_whitespace]
)
messy_text = " HELLO WORLD 123 "
clean_text = text_normalizer.call(messy_text)
# => "hello world"
Performance & Memory
Large-scale transformations require consideration of memory usage and processing time. Ruby's enumerable methods provide lazy evaluation options that process data incrementally rather than loading entire datasets into memory.
# Memory-efficient file processing
def process_large_file(filename)
File.foreach(filename).lazy
.map(&:chomp)
.reject(&:empty?)
.map { |line| line.split(',') }
.select { |fields| fields.length > 3 }
.map { |fields| { id: fields[0], name: fields[1], email: fields[2] } }
.each_slice(1000)
.each { |batch| process_batch(batch) }
end
String transformations benefit from using bang methods when modifying strings in place versus creating new string objects. However, bang methods modify the original string, which may not be appropriate in all contexts.
require 'benchmark'
# Performance comparison: bang vs non-bang methods
text = "hello world" * 1000
Benchmark.bm do |x|
x.report("upcase!") do
1000.times { text.dup.upcase! }
end
x.report("upcase") do
1000.times { text.upcase }
end
end
# String encoding performance
large_text = "café" * 10000
Benchmark.bm do |x|
x.report("encode") do
100.times { large_text.encode('ISO-8859-1') }
end
x.report("encode with options") do
100.times { large_text.encode('ISO-8859-1', undef: :replace) }
end
end
Hash and array transformations show different performance characteristics depending on the size of the data and the complexity of the transformation logic. Pre-allocated data structures can improve performance for predictable transformation patterns.
# Pre-allocation for better performance
def transform_with_preallocation(data, size_hint = nil)
result = size_hint ? Array.new(size_hint) : []
data.each_with_index do |item, index|
transformed = expensive_transformation(item)
if size_hint
result[index] = transformed
else
result << transformed
end
end
result
end
# Comparison of transformation approaches
large_array = (1..100000).to_a
Benchmark.bm do |x|
x.report("map") do
large_array.map { |n| n * 2 }
end
x.report("each_with_object") do
large_array.each_with_object([]) { |n, acc| acc << n * 2 }
end
x.report("preallocation") do
result = Array.new(large_array.size)
large_array.each_with_index { |n, i| result[i] = n * 2 }
result
end
end
Memory profiling helps identify bottlenecks in transformation pipelines. Ruby provides memory profiling tools that track object allocation and garbage collection patterns during transformation operations.
require 'memory_profiler'
# Memory profiling for transformations
data = Array.new(10000) { { name: "user#{rand(1000)}", age: rand(100) } }
report = MemoryProfiler.report do
transformed = data.map do |record|
{
display_name: record[:name].upcase,
age_group: case record[:age]
when 0..18 then 'minor'
when 19..64 then 'adult'
else 'senior'
end
}
end
end
puts report.pretty_print
Error Handling & Debugging
Encoding transformations can fail when converting between incompatible character sets or when encountering invalid byte sequences. Ruby provides several strategies for handling encoding errors gracefully.
# Robust encoding transformation
def safe_encode(text, target_encoding, fallback_encoding = 'UTF-8')
text.encode(target_encoding)
rescue Encoding::UndefinedConversionError => e
puts "Conversion error: #{e.message}"
text.encode(target_encoding, undef: :replace, replace: '?')
rescue Encoding::InvalidByteSequenceError => e
puts "Invalid byte sequence: #{e.message}"
text.encode(fallback_encoding, invalid: :replace, replace: '?')
.encode(target_encoding, undef: :replace, replace: '?')
end
# Example usage with problematic input
corrupted_text = "caf\xE9".force_encoding('UTF-8') # Invalid UTF-8
safe_result = safe_encode(corrupted_text, 'ASCII')
# => Conversion error: U+00E9 from UTF-8 to US-ASCII
# => "caf?"
Pattern matching errors in regular expression transformations require validation of both the pattern and the input data. Complex patterns can cause performance issues or unexpected matches.
def safe_gsub(text, pattern, replacement, max_replacements: Float::INFINITY)
return text unless text.is_a?(String)
# Validate pattern
unless pattern.is_a?(Regexp) || pattern.is_a?(String)
raise ArgumentError, "Pattern must be a Regexp or String"
end
replacement_count = 0
result = text.gsub(pattern) do |match|
replacement_count += 1
if replacement_count > max_replacements
break match # Stop replacing after limit
end
case replacement
when String then replacement
when Proc then replacement.call(match)
else replacement.to_s
end
end
{ result: result, replacements: replacement_count }
rescue RegexpError => e
raise ArgumentError, "Invalid regular expression: #{e.message}"
end
# Usage with error handling
begin
result = safe_gsub("hello world", /l+/, "X", max_replacements: 2)
puts result # => {:result=>"heXXo world", :replacements=>2}
rescue ArgumentError => e
puts "Transformation error: #{e.message}"
end
Collection transformation errors occur when transformation blocks raise exceptions or return unexpected values. Implementing error recovery strategies ensures partial results rather than complete failure.
def robust_map(collection, &block)
return enum_for(:robust_map) unless block_given?
results = []
errors = []
collection.each_with_index do |item, index|
begin
transformed = block.call(item)
results << { index: index, value: transformed, error: nil }
rescue StandardError => e
results << { index: index, value: nil, error: e }
errors << { index: index, item: item, error: e }
end
end
OpenStruct.new(
results: results.map { |r| r[:value] },
errors: errors,
success_count: results.count { |r| r[:error].nil? },
error_count: errors.length
)
end
# Example with mixed success/failure
mixed_data = [1, "2", 3, nil, "5", {}]
result = robust_map(mixed_data) { |x| Integer(x) * 2 }
puts "Successful transformations: #{result.success_count}"
puts "Errors: #{result.error_count}"
puts "Results: #{result.results.compact}"
# => Successful transformations: 3
# => Errors: 3
# => Results: [2, 4, 6]
Debugging transformation pipelines requires visibility into intermediate steps and data flow. Custom debugging utilities can track transformations through complex processing chains.
class DebuggingTransformer
def initialize(debug: false)
@debug = debug
@steps = []
end
def transform(data, name = nil, &block)
input_info = debug_info(data)
log_step("INPUT", name, input_info) if @debug
result = block.call(data)
output_info = debug_info(result)
log_step("OUTPUT", name, output_info) if @debug
@steps << { name: name, input: input_info, output: output_info }
result
rescue StandardError => e
error_info = { error: e.class.name, message: e.message }
log_step("ERROR", name, error_info) if @debug
@steps << { name: name, input: input_info, error: error_info }
raise
end
def summary
@steps.map do |step|
if step[:error]
"#{step[:name]}: ERROR - #{step[:error][:message]}"
else
"#{step[:name]}: #{step[:input][:type]}(#{step[:input][:size]}) -> #{step[:output][:type]}(#{step[:output][:size]})"
end
end.join("\n")
end
private
def debug_info(data)
{
type: data.class.name,
size: data.respond_to?(:size) ? data.size : 1,
sample: data.respond_to?(:first) ? data.first : data
}
end
def log_step(type, name, info)
puts "[DEBUG] #{type} #{name}: #{info}"
end
end
# Usage example
transformer = DebuggingTransformer.new(debug: true)
data = [" HELLO ", " WORLD ", " "]
result = transformer.transform(data, "strip") { |arr| arr.map(&:strip) }
result = transformer.transform(result, "filter_empty") { |arr| arr.reject(&:empty?) }
result = transformer.transform(result, "downcase") { |arr| arr.map(&:downcase) }
puts transformer.summary
# => strip: Array(3) -> Array(3)
# => filter_empty: Array(3) -> Array(2)
# => downcase: Array(2) -> Array(2)
Reference
String Transformation Methods
Method | Parameters | Returns | Description |
---|---|---|---|
String#encode(encoding, **opts) |
encoding (String/Encoding), options (Hash) |
String |
Converts string to specified encoding |
String#tr(from_str, to_str) |
from_str (String), to_str (String) |
String |
Translates characters using character sets |
String#gsub(pattern, replacement) |
pattern (Regexp/String), replacement (String/Proc) |
String |
Replaces all pattern matches |
String#sub(pattern, replacement) |
pattern (Regexp/String), replacement (String/Proc) |
String |
Replaces first pattern match |
String#upcase(**opts) |
options (Hash) | String |
Converts to uppercase |
String#downcase(**opts) |
options (Hash) | String |
Converts to lowercase |
String#swapcase |
none | String |
Swaps case of each character |
String#capitalize(**opts) |
options (Hash) | String |
Capitalizes first character |
Array Transformation Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Array#map(&block) |
block (Proc) | Array |
Transforms each element via block |
Array#collect(&block) |
block (Proc) | Array |
Alias for map |
Array#filter_map(&block) |
block (Proc) | Array |
Maps and filters nil values |
Array#flat_map(&block) |
block (Proc) | Array |
Maps and flattens results |
Array#transform_values(&block) |
block (Proc) | Array |
Ruby 2.7+ method for value transformation |
Hash Transformation Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Hash#transform_keys(&block) |
block (Proc) | Hash |
Transforms all keys via block |
Hash#transform_values(&block) |
block (Proc) | Hash |
Transforms all values via block |
Hash#transform_keys!(&block) |
block (Proc) | Hash |
Transforms keys in place |
Hash#transform_values!(&block) |
block (Proc) | Hash |
Transforms values in place |
Encoding Options
Option | Values | Description |
---|---|---|
:invalid |
:replace , :undef |
Handling for invalid byte sequences |
:undef |
:replace |
Handling for undefined conversions |
:replace |
String | Replacement character for invalid/undefined |
:newline |
:universal , :crlf , :cr |
Newline conversion mode |
:xml |
:text , :attr |
XML-safe encoding mode |
Common Character Sets for String#tr
Set | Description | Example |
---|---|---|
a-z |
Lowercase letters | text.tr('a-z', 'A-Z') |
A-Z |
Uppercase letters | text.tr('A-Z', 'a-z') |
0-9 |
Digits | text.tr('0-9', '') |
^0-9 |
Non-digits (negated) | text.tr('^0-9', '') |
\s |
Whitespace | text.tr('\s', '_') |
Transformation Error Types
Exception | Cause | Recovery Strategy |
---|---|---|
Encoding::UndefinedConversionError |
Character cannot be represented in target encoding | Use :undef => :replace option |
Encoding::InvalidByteSequenceError |
Invalid byte sequence in source encoding | Use :invalid => :replace option |
RegexpError |
Invalid regular expression pattern | Validate pattern before use |
ArgumentError |
Invalid transformation parameters | Validate inputs and provide defaults |
NoMethodError |
Transformation method not available | Check object type before transformation |
Performance Characteristics
Operation | Time Complexity | Memory Usage | Notes |
---|---|---|---|
String#encode |
O(n) | O(n) | Creates new string object |
String#tr |
O(n) | O(n) | Character-by-character processing |
String#gsub |
O(n*m) | O(n) | Where m is pattern complexity |
Array#map |
O(n) | O(n) | Creates new array |
Hash#transform_keys |
O(n) | O(n) | Creates new hash |
Lazy evaluation | O(1) initial | O(k) | Where k is taken elements |