Overview
Data transformation converts data from one format, structure, or representation to another. This process occurs throughout software systems: when reading files, consuming APIs, processing user input, storing data in databases, or generating output. Data transformation addresses the fundamental problem that different parts of a system use different data representations.
The transformation process involves three stages: reading source data, applying transformation logic, and writing transformed data. Source data arrives in formats like JSON, XML, CSV, binary data, or custom structures. Transformation logic manipulates this data through mapping, filtering, aggregating, reshaping, or combining operations. The output produces data in the target format required by the consuming system.
Ruby excels at data transformation because of its expressive syntax for data manipulation, rich standard library for handling various formats, and ecosystem of gems for specialized transformations. The language treats data transformation as a first-class operation through enumerable methods, pattern matching, and flexible type conversions.
# Basic data transformation from CSV to JSON structure
require 'csv'
require 'json'
csv_data = <<~CSV
name,age,city
Alice,30,NYC
Bob,25,SF
CSV
# Transform CSV to array of hashes
records = CSV.parse(csv_data, headers: true).map(&:to_h)
# => [{"name"=>"Alice", "age"=>"30", "city"=>"NYC"},
# {"name"=>"Bob", "age"=>"25", "city"=>"SF"}]
# Further transform to JSON with type conversion
transformed = records.map do |record|
{
full_name: record['name'],
age: record['age'].to_i,
location: record['city']
}
end
puts JSON.pretty_generate(transformed)
Data transformation challenges include maintaining data integrity during conversion, handling incomplete or malformed data, preserving semantic meaning across format changes, and managing performance with large datasets. The transformation code must balance flexibility with correctness, handling edge cases while remaining maintainable.
Key Principles
Data transformation operates on several fundamental principles that govern how data moves through transformation pipelines. Understanding these principles helps design correct and efficient transformations.
Immutability and Pure Functions
Transformation logic should produce new data structures rather than modifying existing ones. Pure transformation functions take input data and return transformed output without side effects. This principle makes transformations predictable, testable, and composable.
# Pure transformation - returns new data
def normalize_user(user_data)
{
id: user_data[:id],
name: user_data[:first_name] + ' ' + user_data[:last_name],
email: user_data[:email].downcase,
created_at: Time.parse(user_data[:created])
}
end
# Impure transformation - modifies input
def normalize_user_impure!(user_data)
user_data[:name] = user_data.delete(:first_name) + ' ' + user_data.delete(:last_name)
user_data[:email].downcase!
user_data[:created_at] = Time.parse(user_data.delete(:created))
user_data
end
Type Preservation and Coercion
Data types must be handled explicitly during transformation. Type coercion converts values between types, while type preservation maintains the original type system. Different formats have different type capabilities - JSON distinguishes numbers from strings but lacks date types, while Ruby has rich type semantics.
# Explicit type coercion during transformation
def transform_api_response(json_data)
{
user_id: json_data['userId'].to_i, # string to integer
active: json_data['isActive'] == 'true', # string to boolean
balance: BigDecimal(json_data['balance']), # string to decimal
last_login: Time.iso8601(json_data['lastLogin']) # string to time
}
end
Schema Mapping
Transformation maps source schema to target schema. Schemas define field names, types, nesting levels, and relationships. Schema mapping handles field renaming, type conversion, denormalization or normalization, and structural changes.
# Schema mapping between flat and nested structures
def map_order_schema(flat_order)
{
id: flat_order[:order_id],
customer: {
id: flat_order[:customer_id],
name: flat_order[:customer_name],
email: flat_order[:customer_email]
},
items: parse_items(flat_order[:items_json]),
total: Money.new(flat_order[:total_cents], 'USD'),
status: flat_order[:status].to_sym
}
end
Composition and Pipelines
Complex transformations compose from simpler transformations. Pipeline architecture chains transformations where each stage's output feeds the next stage's input. This modular approach separates concerns and enables reuse.
# Transformation pipeline with composition
class TransformationPipeline
def initialize
@transformations = []
end
def add(transformation)
@transformations << transformation
self
end
def execute(data)
@transformations.reduce(data) do |current_data, transformation|
transformation.call(current_data)
end
end
end
# Usage
pipeline = TransformationPipeline.new
.add(->(data) { data.map { |item| item.transform_keys(&:to_sym) } })
.add(->(data) { data.select { |item| item[:active] } })
.add(->(data) { data.map { |item| normalize_item(item) } })
result = pipeline.execute(raw_data)
Error Handling and Validation
Transformation processes must handle invalid input gracefully. Validation occurs before transformation to catch malformed data. Error handling strategies include fail-fast approaches that raise exceptions, error collection that gathers all problems, or fallback values for missing data.
Bidirectional Transformation
Some transformations need to work in both directions - serializing data for storage and deserializing it for use. Bidirectional transformations must maintain round-trip consistency, where transforming data forth and back yields the original data.
Ruby Implementation
Ruby provides multiple approaches to data transformation through its standard library and enumerable methods. The language's flexibility supports declarative and imperative transformation styles.
Enumerable Transformations
Ruby's Enumerable module forms the foundation for collection transformations. Methods like map, select, reject, and reduce transform collections declaratively.
# Map transforms each element
user_ids = users.map { |user| user[:id] }
user_ids = users.map(&:id) # shorthand with symbol to_proc
# Select filters elements
active_users = users.select { |user| user[:active] }
# Reduce aggregates data
total_revenue = orders.reduce(0) { |sum, order| sum + order[:amount] }
# Combining transformations
summary = orders
.select { |o| o[:status] == 'completed' }
.map { |o| o[:amount] }
.reduce(0, :+)
Hash Transformations
Hash manipulation is central to data transformation since many formats map to Ruby hashes. Ruby provides methods for transforming hash keys and values.
# Transform keys
snake_case = camel_case_hash.transform_keys { |k| k.to_s.gsub(/([A-Z])/, '_\1').downcase }
# Transform values
doubled = numbers_hash.transform_values { |v| v * 2 }
# Transform both keys and values
normalized = raw_data.transform_keys(&:to_sym).transform_values(&:strip)
# Selective transformation with merge
updated = user.merge(
email: user[:email].downcase,
created_at: Time.parse(user[:created_at])
)
Pattern Matching for Transformation
Ruby's pattern matching (introduced in Ruby 2.7, enhanced in 3.0) enables declarative transformations based on data structure.
def transform_event(event)
case event
in { type: 'user_created', data: { name:, email: } }
{ event_type: :user_created, user: { name: name, email: email.downcase } }
in { type: 'order_placed', data: { order_id:, items: } }
{ event_type: :order_placed, order_id: order_id, item_count: items.size }
in { type: 'payment', data: { amount:, currency: 'USD' } }
{ event_type: :payment, amount_cents: (amount * 100).to_i }
else
{ event_type: :unknown, raw: event }
end
end
Data Class Transformations
Ruby's Data class and Struct provide lightweight structures for transformation targets.
User = Data.define(:id, :name, :email, :created_at)
def transform_to_user(raw_data)
User.new(
id: raw_data['id'].to_i,
name: "#{raw_data['firstName']} #{raw_data['lastName']}",
email: raw_data['email'].downcase,
created_at: Time.iso8601(raw_data['createdAt'])
)
end
# Transform collection
users = raw_users.map { |raw| transform_to_user(raw) }
Format-Specific Transformations
Ruby's standard library includes parsers and generators for common formats. Each format has specific transformation requirements.
require 'json'
require 'yaml'
require 'csv'
# JSON transformation
json_string = '{"userId": 123, "userName": "Alice"}'
parsed = JSON.parse(json_string, symbolize_names: true)
transformed = { user_id: parsed[:userId], name: parsed[:userName] }
output = JSON.generate(transformed)
# YAML transformation with custom types
class Money
def encode_with(coder)
coder['amount'] = @amount
coder['currency'] = @currency
end
end
# CSV transformation with headers
CSV.parse(csv_data, headers: true, header_converters: :symbol) do |row|
transform_row(row.to_h)
end
Lazy Evaluation for Large Datasets
Lazy enumerables defer transformation execution, processing data on-demand rather than eagerly. This approach handles large datasets efficiently.
# Eager evaluation - loads everything into memory
result = huge_dataset.map { |item| transform(item) }.select { |item| item[:valid] }
# Lazy evaluation - processes items one at a time
result = huge_dataset.lazy
.map { |item| transform(item) }
.select { |item| item[:valid] }
.first(100) # Only processes until 100 valid items found
Streaming Transformations
Streaming processes data in chunks rather than loading entire datasets. This technique handles files and streams too large for memory.
def transform_large_csv(input_path, output_path)
CSV.open(output_path, 'w') do |csv_out|
CSV.foreach(input_path, headers: true) do |row|
transformed = transform_csv_row(row.to_h)
csv_out << transformed.values
end
end
end
Practical Examples
API Response Transformation
APIs return data in formats designed for transmission, requiring transformation into domain objects. This example transforms a REST API response into application models.
# API returns nested JSON with different naming conventions
api_response = {
'userData' => {
'userId' => '12345',
'userName' => 'alice_smith',
'userEmail' => 'alice@example.com',
'registeredAt' => '2024-01-15T10:30:00Z',
'accountStatus' => 'active'
},
'userSettings' => {
'emailNotifications' => 'true',
'theme' => 'dark',
'language' => 'en-US'
}
}
# Transform to application domain model
class UserTransformer
def self.from_api(response)
user_data = response['userData']
settings = response['userSettings']
{
id: user_data['userId'].to_i,
username: user_data['userName'],
email: user_data['userEmail'].downcase,
registered_at: Time.iso8601(user_data['registeredAt']),
active: user_data['accountStatus'] == 'active',
settings: transform_settings(settings)
}
end
def self.transform_settings(settings)
{
email_notifications: settings['emailNotifications'] == 'true',
theme: settings['theme'].to_sym,
locale: settings['language']
}
end
end
user = UserTransformer.from_api(api_response)
# => {:id=>12345, :username=>"alice_smith", :email=>"alice@example.com",
# :registered_at=>2024-01-15 10:30:00 UTC, :active=>true,
# :settings=>{:email_notifications=>true, :theme=>:dark, :locale=>"en-US"}}
Database Record Transformation
Database queries return row data that needs transformation into domain objects. This example handles joins, type conversions, and aggregation.
# Raw database rows from SQL query with joins
db_rows = [
{ order_id: 1, customer_name: 'Alice', item_name: 'Widget', quantity: 2, price_cents: 1000 },
{ order_id: 1, customer_name: 'Alice', item_name: 'Gadget', quantity: 1, price_cents: 2000 },
{ order_id: 2, customer_name: 'Bob', item_name: 'Widget', quantity: 5, price_cents: 1000 }
]
# Transform flat joined data into nested order structures
def transform_orders(rows)
rows.group_by { |row| row[:order_id] }.map do |order_id, order_rows|
first_row = order_rows.first
{
id: order_id,
customer: first_row[:customer_name],
items: order_rows.map do |row|
{
name: row[:item_name],
quantity: row[:quantity],
price: Money.new(row[:price_cents], 'USD')
}
end,
total: calculate_total(order_rows)
}
end
end
def calculate_total(rows)
total_cents = rows.sum { |row| row[:quantity] * row[:price_cents] }
Money.new(total_cents, 'USD')
end
orders = transform_orders(db_rows)
# => [{:id=>1, :customer=>"Alice",
# :items=>[{:name=>"Widget", :quantity=>2, :price=>#<Money...>}, ...],
# :total=>#<Money...>}, ...]
File Format Conversion
Converting between file formats requires parsing source format, transforming data structure, and generating target format. This example converts CSV to JSON with data enrichment.
require 'csv'
require 'json'
# Source CSV file with sales data
csv_content = <<~CSV
date,product_id,quantity,unit_price
2024-01-15,WIDGET-001,10,25.50
2024-01-15,GADGET-002,5,45.00
2024-01-16,WIDGET-001,8,25.50
CSV
# Product catalog for enrichment
PRODUCTS = {
'WIDGET-001' => { name: 'Premium Widget', category: 'widgets' },
'GADGET-002' => { name: 'Super Gadget', category: 'gadgets' }
}
def transform_sales_data(csv_string)
records = CSV.parse(csv_string, headers: true)
transformed = records.map do |row|
product_id = row['product_id']
product_info = PRODUCTS[product_id] || { name: 'Unknown', category: 'other' }
quantity = row['quantity'].to_i
unit_price = BigDecimal(row['unit_price'])
{
date: Date.parse(row['date']).iso8601,
product: {
id: product_id,
name: product_info[:name],
category: product_info[:category]
},
quantity: quantity,
unit_price: unit_price.to_f,
total: (quantity * unit_price).to_f
}
end
JSON.pretty_generate({ sales: transformed, record_count: transformed.size })
end
json_output = transform_sales_data(csv_content)
Log Data Transformation
Log parsing extracts structured data from unstructured text. This example parses web server logs into structured format for analysis.
# Apache Common Log Format
log_lines = [
'127.0.0.1 - - [15/Jan/2024:10:30:45 +0000] "GET /api/users HTTP/1.1" 200 1234',
'192.168.1.10 - alice [15/Jan/2024:10:31:12 +0000] "POST /api/orders HTTP/1.1" 201 567'
]
LOG_PATTERN = /^(\S+) \S+ (\S+) \[([^\]]+)\] "(\S+) (\S+) \S+" (\d+) (\d+)$/
def transform_log_entry(line)
match = line.match(LOG_PATTERN)
return nil unless match
ip, user, timestamp, method, path, status, bytes = match.captures
{
client: {
ip: ip,
user: user == '-' ? nil : user
},
timestamp: DateTime.strptime(timestamp, '%d/%b/%Y:%H:%M:%S %z'),
request: {
method: method,
path: path,
endpoint: extract_endpoint(path)
},
response: {
status: status.to_i,
bytes: bytes.to_i
}
}
end
def extract_endpoint(path)
path.split('?').first.gsub(/\/\d+/, '/:id')
end
structured_logs = log_lines.map { |line| transform_log_entry(line) }.compact
Common Patterns
Builder Pattern for Complex Transformations
The builder pattern constructs transformed objects step by step, handling optional fields and complex validation.
class UserBuilder
def initialize(source_data)
@source = source_data
@user = {}
@errors = []
end
def with_id
if @source[:id].to_s.match?(/^\d+$/)
@user[:id] = @source[:id].to_i
else
@errors << "Invalid id: #{@source[:id]}"
end
self
end
def with_email
email = @source[:email].to_s.downcase
if email.match?(/@/)
@user[:email] = email
else
@errors << "Invalid email: #{email}"
end
self
end
def with_timestamps
@user[:created_at] = parse_timestamp(@source[:created_at])
@user[:updated_at] = parse_timestamp(@source[:updated_at]) || Time.now
self
end
def build
raise "Validation errors: #{@errors.join(', ')}" if @errors.any?
@user
end
private
def parse_timestamp(value)
Time.parse(value.to_s)
rescue ArgumentError
nil
end
end
# Usage
user = UserBuilder.new(raw_data)
.with_id
.with_email
.with_timestamps
.build
Adapter Pattern for Format Abstraction
The adapter pattern provides a uniform interface for transforming different input formats into a common structure.
class TransformationAdapter
def transform(source)
raise NotImplementedError
end
end
class JsonAdapter < TransformationAdapter
def transform(json_string)
data = JSON.parse(json_string, symbolize_names: true)
normalize(data)
end
private
def normalize(data)
{
id: data[:id],
name: data[:name],
attributes: data
}
end
end
class XmlAdapter < TransformationAdapter
def transform(xml_string)
require 'rexml/document'
doc = REXML::Document.new(xml_string)
normalize(doc.root)
end
private
def normalize(element)
{
id: element.attributes['id'],
name: element.elements['name']&.text,
attributes: extract_attributes(element)
}
end
def extract_attributes(element)
element.elements.to_a.each_with_object({}) do |child, hash|
hash[child.name.to_sym] = child.text
end
end
end
# Usage with strategy pattern
def transform_data(source, format:)
adapter = case format
when :json then JsonAdapter.new
when :xml then XmlAdapter.new
else raise "Unsupported format: #{format}"
end
adapter.transform(source)
end
Decorator Pattern for Transformation Chains
Decorators add transformation layers incrementally, composing complex transformations from simple ones.
class BaseTransformer
def transform(data)
data
end
end
class TypeCoercionDecorator
def initialize(transformer)
@transformer = transformer
end
def transform(data)
result = @transformer.transform(data)
coerce_types(result)
end
private
def coerce_types(hash)
hash.transform_values do |value|
case value
when /^\d+$/ then value.to_i
when /^\d+\.\d+$/ then value.to_f
when 'true' then true
when 'false' then false
else value
end
end
end
end
class KeyNormalizationDecorator
def initialize(transformer)
@transformer = transformer
end
def transform(data)
result = @transformer.transform(data)
result.transform_keys { |k| k.to_s.gsub(/([A-Z])/, '_\1').downcase.to_sym }
end
end
class ValidationDecorator
def initialize(transformer, required_keys:)
@transformer = transformer
@required_keys = required_keys
end
def transform(data)
result = @transformer.transform(data)
validate!(result)
result
end
private
def validate!(data)
missing = @required_keys - data.keys
raise "Missing required keys: #{missing.join(', ')}" if missing.any?
end
end
# Compose transformers
transformer = ValidationDecorator.new(
TypeCoercionDecorator.new(
KeyNormalizationDecorator.new(
BaseTransformer.new
)
),
required_keys: [:id, :name]
)
result = transformer.transform(raw_data)
Registry Pattern for Dynamic Transformers
The registry pattern maps transformation rules dynamically based on data characteristics.
class TransformerRegistry
def initialize
@transformers = {}
end
def register(type, transformer)
@transformers[type] = transformer
end
def transform(data)
type = detect_type(data)
transformer = @transformers[type]
raise "No transformer for type: #{type}" unless transformer
transformer.call(data)
end
private
def detect_type(data)
return :user if data.key?(:email) || data.key?('email')
return :order if data.key?(:order_id) || data.key?('orderId')
return :product if data.key?(:sku) || data.key?('sku')
:generic
end
end
# Setup registry
registry = TransformerRegistry.new
registry.register(:user, ->(data) {
{
id: data[:id] || data['id'],
email: (data[:email] || data['email']).downcase,
name: data[:name] || data['name']
}
})
registry.register(:order, ->(data) {
{
id: data[:order_id] || data['orderId'],
total: data[:total] || data['total'],
status: (data[:status] || data['status']).to_sym
}
})
# Transform based on detected type
result = registry.transform(incoming_data)
Error Handling & Edge Cases
Data transformation encounters numerous error conditions from malformed input, missing fields, type mismatches, and encoding issues. Handling these cases determines transformation reliability.
Validation Before Transformation
Validate input structure before attempting transformation to fail fast on invalid data.
class DataValidator
def self.validate!(data, schema)
errors = []
schema.each do |field, rules|
value = data[field]
if rules[:required] && value.nil?
errors << "Missing required field: #{field}"
next
end
next if value.nil? && !rules[:required]
if rules[:type] && !value.is_a?(rules[:type])
errors << "Invalid type for #{field}: expected #{rules[:type]}, got #{value.class}"
end
if rules[:format] && !value.to_s.match?(rules[:format])
errors << "Invalid format for #{field}: #{value}"
end
end
raise ValidationError, errors.join('; ') if errors.any?
end
end
# Usage
schema = {
id: { required: true, type: Integer },
email: { required: true, format: /@/ },
age: { type: Integer }
}
begin
DataValidator.validate!(input_data, schema)
result = transform(input_data)
rescue ValidationError => e
log_error(e)
return default_value
end
Graceful Degradation with Fallbacks
Provide fallback values when optional data is missing or malformed rather than failing completely.
def safe_transform(data)
{
id: extract_id(data) || generate_id,
name: extract_name(data) || 'Unknown',
email: normalize_email(data[:email]) || nil,
age: parse_age(data[:age]) || 0,
created_at: parse_timestamp(data[:created_at]) || Time.now
}
end
def normalize_email(value)
return nil if value.nil? || value.to_s.empty?
email = value.to_s.strip.downcase
email if email.match?(/@/)
end
def parse_age(value)
Integer(value)
rescue ArgumentError, TypeError
nil
end
def parse_timestamp(value)
Time.parse(value.to_s)
rescue ArgumentError
nil
end
Error Collection vs. Fail-Fast
Choose between collecting all errors for batch reporting or failing immediately on first error.
# Fail-fast approach
def transform_strict(items)
items.map do |item|
validate_item!(item)
transform_item(item)
end
end
# Error collection approach
def transform_lenient(items)
results = []
errors = []
items.each_with_index do |item, index|
begin
validate_item!(item)
results << transform_item(item)
rescue StandardError => e
errors << { index: index, item: item, error: e.message }
end
end
{ results: results, errors: errors, success_rate: results.size.to_f / items.size }
end
Encoding and Character Set Issues
Handle text encoding problems during transformation, particularly when reading files or external data.
def safe_read_and_transform(file_path)
content = File.read(file_path, encoding: 'UTF-8')
transform(content)
rescue Encoding::InvalidByteSequenceError
# Try with different encoding
content = File.read(file_path, encoding: 'ISO-8859-1').encode('UTF-8', invalid: :replace, undef: :replace)
transform(content)
end
def sanitize_string(value)
value.to_s
.encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
.scrub('')
end
Partial Transformation Results
Handle scenarios where some transformations succeed while others fail in batch processing.
class TransformationResult
attr_reader :successes, :failures
def initialize
@successes = []
@failures = []
end
def add_success(item)
@successes << item
end
def add_failure(item, error)
@failures << { item: item, error: error }
end
def success?
@failures.empty?
end
def partial_success?
@successes.any? && @failures.any?
end
end
def batch_transform(items)
result = TransformationResult.new
items.each do |item|
begin
transformed = transform_item(item)
result.add_success(transformed)
rescue StandardError => e
result.add_failure(item, e.message)
end
end
result
end
Circular Reference Detection
Detect circular references in nested structures during transformation to prevent infinite recursion.
def transform_with_cycle_detection(data, visited = Set.new)
object_id = data.object_id
if visited.include?(object_id)
raise CircularReferenceError, "Circular reference detected"
end
visited.add(object_id)
case data
when Hash
data.transform_values { |v| transform_with_cycle_detection(v, visited.dup) }
when Array
data.map { |item| transform_with_cycle_detection(item, visited.dup) }
else
data
end
ensure
visited.delete(object_id)
end
Performance Considerations
Data transformation performance matters when processing large datasets, real-time streams, or high-throughput systems. Performance optimization balances speed, memory usage, and code clarity.
Lazy Evaluation for Memory Efficiency
Lazy evaluation defers computation until results are needed, processing items one at a time instead of materializing entire collections.
# Eager - loads entire dataset into memory
def eager_transform(file_path)
File.readlines(file_path)
.map { |line| parse_line(line) }
.select { |record| record[:valid] }
.map { |record| transform_record(record) }
end
# Lazy - processes line by line
def lazy_transform(file_path)
File.foreach(file_path).lazy
.map { |line| parse_line(line) }
.select { |record| record[:valid] }
.map { |record| transform_record(record) }
end
# Only processes first 1000 valid records
result = lazy_transform('large_file.txt').first(1000)
Batch Processing for Throughput
Batch processing amortizes overhead by processing multiple items together, particularly effective for database operations or API calls.
def batch_transform_and_save(items, batch_size: 1000)
items.each_slice(batch_size) do |batch|
transformed = batch.map { |item| transform_item(item) }
save_batch(transformed)
end
end
# Parallel batch processing
require 'parallel'
def parallel_batch_transform(items, batch_size: 1000)
batches = items.each_slice(batch_size).to_a
Parallel.map(batches, in_threads: 4) do |batch|
batch.map { |item| transform_item(item) }
end.flatten
end
Caching Expensive Operations
Cache transformation results for repeated data or expensive operations like API lookups.
class CachingTransformer
def initialize
@cache = {}
end
def transform(data)
cache_key = generate_cache_key(data)
@cache[cache_key] ||= expensive_transform(data)
end
private
def generate_cache_key(data)
# Use hash of data or unique identifier
data.hash
end
def expensive_transform(data)
# Expensive transformation logic
sleep 0.1 # Simulating expensive operation
transform_data(data)
end
end
# Memoization for method-level caching
require 'memoist'
class DataTransformer
extend Memoist
def lookup_category(product_id)
# Expensive database or API call
CategoryAPI.fetch(product_id)
end
memoize :lookup_category
end
Streaming for Large Files
Stream processing handles files too large for memory by processing data in chunks.
# Memory-efficient CSV transformation
def stream_transform_csv(input_path, output_path)
CSV.open(output_path, 'w') do |output|
CSV.foreach(input_path, headers: true) do |row|
transformed = transform_row(row)
output << transformed
end
end
end
# Streaming JSON array processing
require 'json/stream'
def stream_transform_json(input_stream, output_stream)
parser = JSON::Stream::Parser.new
parser.key do |key|
# Handle each key
end
parser.value do |value|
transformed = transform_value(value)
output_stream.puts(JSON.generate(transformed))
end
input_stream.each { |chunk| parser << chunk }
end
Algorithmic Complexity Optimization
Choose efficient algorithms and data structures for transformation operations.
# O(n²) - inefficient lookup
def transform_with_lookup_slow(items, lookup_data)
items.map do |item|
match = lookup_data.find { |lookup| lookup[:id] == item[:ref_id] }
item.merge(lookup_info: match)
end
end
# O(n) - efficient hash lookup
def transform_with_lookup_fast(items, lookup_data)
lookup_hash = lookup_data.index_by { |lookup| lookup[:id] }
items.map do |item|
item.merge(lookup_info: lookup_hash[item[:ref_id]])
end
end
# Benchmark comparison
require 'benchmark'
Benchmark.bmbm do |x|
x.report("slow") { transform_with_lookup_slow(items, lookups) }
x.report("fast") { transform_with_lookup_fast(items, lookups) }
end
Memory Profiling and Optimization
Profile memory usage to identify bottlenecks and optimize allocations.
require 'memory_profiler'
report = MemoryProfiler.report do
transform_large_dataset(data)
end
report.pretty_print
# Reduce allocations by reusing objects
class OptimizedTransformer
def initialize
@buffer = String.new
@result_buffer = []
end
def transform_batch(items)
@result_buffer.clear
items.each do |item|
@result_buffer << transform_item(item)
end
@result_buffer.dup
end
end
Reference
Core Transformation Methods
| Method | Purpose | Example |
|---|---|---|
| map | Transform each element | items.map(&:upcase) |
| select | Filter elements | items.select { |
| reject | Exclude elements | items.reject(&:nil?) |
| reduce | Aggregate data | items.reduce(:+) |
| transform_keys | Transform hash keys | hash.transform_keys(&:to_sym) |
| transform_values | Transform hash values | hash.transform_values(&:upcase) |
| group_by | Group by criteria | items.group_by(&:type) |
| partition | Split into two groups | items.partition(&:valid?) |
| each_with_object | Build result object | items.each_with_object({}) |
| flat_map | Map and flatten | items.flat_map(&:items) |
Format Parsing Libraries
| Format | Library | Key Methods |
|---|---|---|
| JSON | json (stdlib) | JSON.parse, JSON.generate |
| YAML | yaml (stdlib) | YAML.load, YAML.dump |
| CSV | csv (stdlib) | CSV.parse, CSV.foreach |
| XML | rexml (stdlib) | REXML::Document.new |
| XML | nokogiri (gem) | Nokogiri::XML, Nokogiri::HTML |
| MessagePack | msgpack (gem) | MessagePack.pack, MessagePack.unpack |
| TOML | toml-rb (gem) | TomlRB.load_file |
Type Conversion Methods
| Conversion | Method | Notes |
|---|---|---|
| String to Integer | to_i | Returns 0 for non-numeric |
| String to Float | to_f | Returns 0.0 for non-numeric |
| String to Symbol | to_sym | Creates symbol from string |
| Symbol to String | to_s | Converts symbol to string |
| String to Boolean | custom | Use comparison or case |
| String to Date | Date.parse | Requires date library |
| String to Time | Time.parse | Flexible parsing |
| String to Time ISO | Time.iso8601 | Strict ISO format |
| Integer to String | to_s | Basic string conversion |
| Hash to JSON | to_json | Requires json library |
Validation Patterns
| Pattern | Implementation | Use Case |
|---|---|---|
| Required field | value.nil? | Ensure field exists |
| Type check | value.is_a?(Integer) | Verify data type |
| Format match | value.match?(/regex/) | Validate format |
| Range check | (min..max).cover?(value) | Numeric ranges |
| Enum check | allowed.include?(value) | Limited values |
| Custom validation | validator.call(value) | Complex rules |
Error Handling Strategies
| Strategy | Approach | Example |
|---|---|---|
| Fail fast | Raise on first error | validate! method |
| Error collection | Collect all errors | errors array |
| Fallback values | Provide defaults | value or default |
| Silent skip | Ignore invalid items | compact after map |
| Partial results | Return valid subset | filter successes |
| Retry logic | Retry on transient errors | retry 3 times |
Performance Techniques
| Technique | Benefit | Trade-off |
|---|---|---|
| Lazy evaluation | Memory efficient | Deferred computation |
| Batch processing | Higher throughput | Latency per item |
| Caching | Avoid recomputation | Memory usage |
| Streaming | Handle large files | Complexity |
| Parallel processing | Use multiple cores | Coordination overhead |
| Index lookups | Fast access | Build time |
Common Transformation Patterns
| Pattern | Purpose | Implementation |
|---|---|---|
| Map-Reduce | Aggregate data | map then reduce |
| Filter-Map | Transform subset | select then map |
| Flatten | Denormalize nested data | flat_map or flatten |
| Group-Aggregate | Summary by category | group_by then map values |
| Join | Combine datasets | merge or zip |
| Pivot | Reshape data | Custom grouping |
| Normalize | Flatten nested structures | Recursive traversal |
| Denormalize | Embed related data | Hash merging |