CrackedRuby - Data Transformation

Overview

Data transformation converts data from one format, structure, or representation to another. This process occurs throughout software systems: when reading files, consuming APIs, processing user input, storing data in databases, or generating output. Data transformation addresses the fundamental problem that different parts of a system use different data representations.

The transformation process involves three stages: reading source data, applying transformation logic, and writing transformed data. Source data arrives in formats like JSON, XML, CSV, binary data, or custom structures. Transformation logic manipulates this data through mapping, filtering, aggregating, reshaping, or combining operations. The output produces data in the target format required by the consuming system.

Ruby excels at data transformation because of its expressive syntax for data manipulation, rich standard library for handling various formats, and ecosystem of gems for specialized transformations. The language treats data transformation as a first-class operation through enumerable methods, pattern matching, and flexible type conversions.

# Basic data transformation from CSV to JSON structure
require 'csv'
require 'json'

csv_data = <<~CSV
  name,age,city
  Alice,30,NYC
  Bob,25,SF
CSV

# Transform CSV to array of hashes
records = CSV.parse(csv_data, headers: true).map(&:to_h)
# => [{"name"=>"Alice", "age"=>"30", "city"=>"NYC"}, 
#     {"name"=>"Bob", "age"=>"25", "city"=>"SF"}]

# Further transform to JSON with type conversion
transformed = records.map do |record|
  {
    full_name: record['name'],
    age: record['age'].to_i,
    location: record['city']
  }
end

puts JSON.pretty_generate(transformed)

Data transformation challenges include maintaining data integrity during conversion, handling incomplete or malformed data, preserving semantic meaning across format changes, and managing performance with large datasets. The transformation code must balance flexibility with correctness, handling edge cases while remaining maintainable.

Key Principles

Data transformation operates on several fundamental principles that govern how data moves through transformation pipelines. Understanding these principles helps design correct and efficient transformations.

Immutability and Pure Functions

Transformation logic should produce new data structures rather than modifying existing ones. Pure transformation functions take input data and return transformed output without side effects. This principle makes transformations predictable, testable, and composable.

# Pure transformation - returns new data
def normalize_user(user_data)
  {
    id: user_data[:id],
    name: user_data[:first_name] + ' ' + user_data[:last_name],
    email: user_data[:email].downcase,
    created_at: Time.parse(user_data[:created])
  }
end

# Impure transformation - modifies input
def normalize_user_impure!(user_data)
  user_data[:name] = user_data.delete(:first_name) + ' ' + user_data.delete(:last_name)
  user_data[:email].downcase!
  user_data[:created_at] = Time.parse(user_data.delete(:created))
  user_data
end

Type Preservation and Coercion

Data types must be handled explicitly during transformation. Type coercion converts values between types, while type preservation maintains the original type system. Different formats have different type capabilities - JSON distinguishes numbers from strings but lacks date types, while Ruby has rich type semantics.

# Explicit type coercion during transformation
def transform_api_response(json_data)
  {
    user_id: json_data['userId'].to_i,              # string to integer
    active: json_data['isActive'] == 'true',        # string to boolean
    balance: BigDecimal(json_data['balance']),      # string to decimal
    last_login: Time.iso8601(json_data['lastLogin']) # string to time
  }
end

Schema Mapping

Transformation maps source schema to target schema. Schemas define field names, types, nesting levels, and relationships. Schema mapping handles field renaming, type conversion, denormalization or normalization, and structural changes.

# Schema mapping between flat and nested structures
def map_order_schema(flat_order)
  {
    id: flat_order[:order_id],
    customer: {
      id: flat_order[:customer_id],
      name: flat_order[:customer_name],
      email: flat_order[:customer_email]
    },
    items: parse_items(flat_order[:items_json]),
    total: Money.new(flat_order[:total_cents], 'USD'),
    status: flat_order[:status].to_sym
  }
end

Composition and Pipelines

Complex transformations compose from simpler transformations. Pipeline architecture chains transformations where each stage's output feeds the next stage's input. This modular approach separates concerns and enables reuse.

# Transformation pipeline with composition
class TransformationPipeline
  def initialize
    @transformations = []
  end

  def add(transformation)
    @transformations << transformation
    self
  end

  def execute(data)
    @transformations.reduce(data) do |current_data, transformation|
      transformation.call(current_data)
    end
  end
end

# Usage
pipeline = TransformationPipeline.new
  .add(->(data) { data.map { |item| item.transform_keys(&:to_sym) } })
  .add(->(data) { data.select { |item| item[:active] } })
  .add(->(data) { data.map { |item| normalize_item(item) } })

result = pipeline.execute(raw_data)

Error Handling and Validation

Transformation processes must handle invalid input gracefully. Validation occurs before transformation to catch malformed data. Error handling strategies include fail-fast approaches that raise exceptions, error collection that gathers all problems, or fallback values for missing data.

Bidirectional Transformation

Some transformations need to work in both directions - serializing data for storage and deserializing it for use. Bidirectional transformations must maintain round-trip consistency, where transforming data forth and back yields the original data.

Ruby Implementation

Ruby provides multiple approaches to data transformation through its standard library and enumerable methods. The language's flexibility supports declarative and imperative transformation styles.

Enumerable Transformations

Ruby's Enumerable module forms the foundation for collection transformations. Methods like map, select, reject, and reduce transform collections declaratively.

# Map transforms each element
user_ids = users.map { |user| user[:id] }
user_ids = users.map(&:id)  # shorthand with symbol to_proc

# Select filters elements
active_users = users.select { |user| user[:active] }

# Reduce aggregates data
total_revenue = orders.reduce(0) { |sum, order| sum + order[:amount] }

# Combining transformations
summary = orders
  .select { |o| o[:status] == 'completed' }
  .map { |o| o[:amount] }
  .reduce(0, :+)

Hash Transformations

Hash manipulation is central to data transformation since many formats map to Ruby hashes. Ruby provides methods for transforming hash keys and values.

# Transform keys
snake_case = camel_case_hash.transform_keys { |k| k.to_s.gsub(/([A-Z])/, '_\1').downcase }

# Transform values
doubled = numbers_hash.transform_values { |v| v * 2 }

# Transform both keys and values
normalized = raw_data.transform_keys(&:to_sym).transform_values(&:strip)

# Selective transformation with merge
updated = user.merge(
  email: user[:email].downcase,
  created_at: Time.parse(user[:created_at])
)

Pattern Matching for Transformation

Ruby's pattern matching (introduced in Ruby 2.7, enhanced in 3.0) enables declarative transformations based on data structure.

def transform_event(event)
  case event
  in { type: 'user_created', data: { name:, email: } }
    { event_type: :user_created, user: { name: name, email: email.downcase } }
  in { type: 'order_placed', data: { order_id:, items: } }
    { event_type: :order_placed, order_id: order_id, item_count: items.size }
  in { type: 'payment', data: { amount:, currency: 'USD' } }
    { event_type: :payment, amount_cents: (amount * 100).to_i }
  else
    { event_type: :unknown, raw: event }
  end
end

Data Class Transformations

Ruby's Data class and Struct provide lightweight structures for transformation targets.

User = Data.define(:id, :name, :email, :created_at)

def transform_to_user(raw_data)
  User.new(
    id: raw_data['id'].to_i,
    name: "#{raw_data['firstName']} #{raw_data['lastName']}",
    email: raw_data['email'].downcase,
    created_at: Time.iso8601(raw_data['createdAt'])
  )
end

# Transform collection
users = raw_users.map { |raw| transform_to_user(raw) }

Format-Specific Transformations

Ruby's standard library includes parsers and generators for common formats. Each format has specific transformation requirements.

require 'json'
require 'yaml'
require 'csv'

# JSON transformation
json_string = '{"userId": 123, "userName": "Alice"}'
parsed = JSON.parse(json_string, symbolize_names: true)
transformed = { user_id: parsed[:userId], name: parsed[:userName] }
output = JSON.generate(transformed)

# YAML transformation with custom types
class Money
  def encode_with(coder)
    coder['amount'] = @amount
    coder['currency'] = @currency
  end
end

# CSV transformation with headers
CSV.parse(csv_data, headers: true, header_converters: :symbol) do |row|
  transform_row(row.to_h)
end

Lazy Evaluation for Large Datasets

Lazy enumerables defer transformation execution, processing data on-demand rather than eagerly. This approach handles large datasets efficiently.

# Eager evaluation - loads everything into memory
result = huge_dataset.map { |item| transform(item) }.select { |item| item[:valid] }

# Lazy evaluation - processes items one at a time
result = huge_dataset.lazy
  .map { |item| transform(item) }
  .select { |item| item[:valid] }
  .first(100)  # Only processes until 100 valid items found

Streaming Transformations

Streaming processes data in chunks rather than loading entire datasets. This technique handles files and streams too large for memory.

def transform_large_csv(input_path, output_path)
  CSV.open(output_path, 'w') do |csv_out|
    CSV.foreach(input_path, headers: true) do |row|
      transformed = transform_csv_row(row.to_h)
      csv_out << transformed.values
    end
  end
end

Practical Examples

API Response Transformation

APIs return data in formats designed for transmission, requiring transformation into domain objects. This example transforms a REST API response into application models.

# API returns nested JSON with different naming conventions
api_response = {
  'userData' => {
    'userId' => '12345',
    'userName' => 'alice_smith',
    'userEmail' => 'alice@example.com',
    'registeredAt' => '2024-01-15T10:30:00Z',
    'accountStatus' => 'active'
  },
  'userSettings' => {
    'emailNotifications' => 'true',
    'theme' => 'dark',
    'language' => 'en-US'
  }
}

# Transform to application domain model
class UserTransformer
  def self.from_api(response)
    user_data = response['userData']
    settings = response['userSettings']
    
    {
      id: user_data['userId'].to_i,
      username: user_data['userName'],
      email: user_data['userEmail'].downcase,
      registered_at: Time.iso8601(user_data['registeredAt']),
      active: user_data['accountStatus'] == 'active',
      settings: transform_settings(settings)
    }
  end
  
  def self.transform_settings(settings)
    {
      email_notifications: settings['emailNotifications'] == 'true',
      theme: settings['theme'].to_sym,
      locale: settings['language']
    }
  end
end

user = UserTransformer.from_api(api_response)
# => {:id=>12345, :username=>"alice_smith", :email=>"alice@example.com", 
#     :registered_at=>2024-01-15 10:30:00 UTC, :active=>true, 
#     :settings=>{:email_notifications=>true, :theme=>:dark, :locale=>"en-US"}}

Database Record Transformation

Database queries return row data that needs transformation into domain objects. This example handles joins, type conversions, and aggregation.

# Raw database rows from SQL query with joins
db_rows = [
  { order_id: 1, customer_name: 'Alice', item_name: 'Widget', quantity: 2, price_cents: 1000 },
  { order_id: 1, customer_name: 'Alice', item_name: 'Gadget', quantity: 1, price_cents: 2000 },
  { order_id: 2, customer_name: 'Bob', item_name: 'Widget', quantity: 5, price_cents: 1000 }
]

# Transform flat joined data into nested order structures
def transform_orders(rows)
  rows.group_by { |row| row[:order_id] }.map do |order_id, order_rows|
    first_row = order_rows.first
    {
      id: order_id,
      customer: first_row[:customer_name],
      items: order_rows.map do |row|
        {
          name: row[:item_name],
          quantity: row[:quantity],
          price: Money.new(row[:price_cents], 'USD')
        }
      end,
      total: calculate_total(order_rows)
    }
  end
end

def calculate_total(rows)
  total_cents = rows.sum { |row| row[:quantity] * row[:price_cents] }
  Money.new(total_cents, 'USD')
end

orders = transform_orders(db_rows)
# => [{:id=>1, :customer=>"Alice", 
#      :items=>[{:name=>"Widget", :quantity=>2, :price=>#<Money...>}, ...], 
#      :total=>#<Money...>}, ...]

File Format Conversion

Converting between file formats requires parsing source format, transforming data structure, and generating target format. This example converts CSV to JSON with data enrichment.

require 'csv'
require 'json'

# Source CSV file with sales data
csv_content = <<~CSV
  date,product_id,quantity,unit_price
  2024-01-15,WIDGET-001,10,25.50
  2024-01-15,GADGET-002,5,45.00
  2024-01-16,WIDGET-001,8,25.50
CSV

# Product catalog for enrichment
PRODUCTS = {
  'WIDGET-001' => { name: 'Premium Widget', category: 'widgets' },
  'GADGET-002' => { name: 'Super Gadget', category: 'gadgets' }
}

def transform_sales_data(csv_string)
  records = CSV.parse(csv_string, headers: true)
  
  transformed = records.map do |row|
    product_id = row['product_id']
    product_info = PRODUCTS[product_id] || { name: 'Unknown', category: 'other' }
    quantity = row['quantity'].to_i
    unit_price = BigDecimal(row['unit_price'])
    
    {
      date: Date.parse(row['date']).iso8601,
      product: {
        id: product_id,
        name: product_info[:name],
        category: product_info[:category]
      },
      quantity: quantity,
      unit_price: unit_price.to_f,
      total: (quantity * unit_price).to_f
    }
  end
  
  JSON.pretty_generate({ sales: transformed, record_count: transformed.size })
end

json_output = transform_sales_data(csv_content)

Log Data Transformation

Log parsing extracts structured data from unstructured text. This example parses web server logs into structured format for analysis.

# Apache Common Log Format
log_lines = [
  '127.0.0.1 - - [15/Jan/2024:10:30:45 +0000] "GET /api/users HTTP/1.1" 200 1234',
  '192.168.1.10 - alice [15/Jan/2024:10:31:12 +0000] "POST /api/orders HTTP/1.1" 201 567'
]

LOG_PATTERN = /^(\S+) \S+ (\S+) \[([^\]]+)\] "(\S+) (\S+) \S+" (\d+) (\d+)$/

def transform_log_entry(line)
  match = line.match(LOG_PATTERN)
  return nil unless match
  
  ip, user, timestamp, method, path, status, bytes = match.captures
  
  {
    client: {
      ip: ip,
      user: user == '-' ? nil : user
    },
    timestamp: DateTime.strptime(timestamp, '%d/%b/%Y:%H:%M:%S %z'),
    request: {
      method: method,
      path: path,
      endpoint: extract_endpoint(path)
    },
    response: {
      status: status.to_i,
      bytes: bytes.to_i
    }
  }
end

def extract_endpoint(path)
  path.split('?').first.gsub(/\/\d+/, '/:id')
end

structured_logs = log_lines.map { |line| transform_log_entry(line) }.compact

Common Patterns

Builder Pattern for Complex Transformations

The builder pattern constructs transformed objects step by step, handling optional fields and complex validation.

class UserBuilder
  def initialize(source_data)
    @source = source_data
    @user = {}
    @errors = []
  end
  
  def with_id
    if @source[:id].to_s.match?(/^\d+$/)
      @user[:id] = @source[:id].to_i
    else
      @errors << "Invalid id: #{@source[:id]}"
    end
    self
  end
  
  def with_email
    email = @source[:email].to_s.downcase
    if email.match?(/@/)
      @user[:email] = email
    else
      @errors << "Invalid email: #{email}"
    end
    self
  end
  
  def with_timestamps
    @user[:created_at] = parse_timestamp(@source[:created_at])
    @user[:updated_at] = parse_timestamp(@source[:updated_at]) || Time.now
    self
  end
  
  def build
    raise "Validation errors: #{@errors.join(', ')}" if @errors.any?
    @user
  end
  
  private
  
  def parse_timestamp(value)
    Time.parse(value.to_s)
  rescue ArgumentError
    nil
  end
end

# Usage
user = UserBuilder.new(raw_data)
  .with_id
  .with_email
  .with_timestamps
  .build

Adapter Pattern for Format Abstraction

The adapter pattern provides a uniform interface for transforming different input formats into a common structure.

class TransformationAdapter
  def transform(source)
    raise NotImplementedError
  end
end

class JsonAdapter < TransformationAdapter
  def transform(json_string)
    data = JSON.parse(json_string, symbolize_names: true)
    normalize(data)
  end
  
  private
  
  def normalize(data)
    {
      id: data[:id],
      name: data[:name],
      attributes: data
    }
  end
end

class XmlAdapter < TransformationAdapter
  def transform(xml_string)
    require 'rexml/document'
    doc = REXML::Document.new(xml_string)
    normalize(doc.root)
  end
  
  private
  
  def normalize(element)
    {
      id: element.attributes['id'],
      name: element.elements['name']&.text,
      attributes: extract_attributes(element)
    }
  end
  
  def extract_attributes(element)
    element.elements.to_a.each_with_object({}) do |child, hash|
      hash[child.name.to_sym] = child.text
    end
  end
end

# Usage with strategy pattern
def transform_data(source, format:)
  adapter = case format
  when :json then JsonAdapter.new
  when :xml then XmlAdapter.new
  else raise "Unsupported format: #{format}"
  end
  
  adapter.transform(source)
end

Decorator Pattern for Transformation Chains

Decorators add transformation layers incrementally, composing complex transformations from simple ones.

class BaseTransformer
  def transform(data)
    data
  end
end

class TypeCoercionDecorator
  def initialize(transformer)
    @transformer = transformer
  end
  
  def transform(data)
    result = @transformer.transform(data)
    coerce_types(result)
  end
  
  private
  
  def coerce_types(hash)
    hash.transform_values do |value|
      case value
      when /^\d+$/ then value.to_i
      when /^\d+\.\d+$/ then value.to_f
      when 'true' then true
      when 'false' then false
      else value
      end
    end
  end
end

class KeyNormalizationDecorator
  def initialize(transformer)
    @transformer = transformer
  end
  
  def transform(data)
    result = @transformer.transform(data)
    result.transform_keys { |k| k.to_s.gsub(/([A-Z])/, '_\1').downcase.to_sym }
  end
end

class ValidationDecorator
  def initialize(transformer, required_keys:)
    @transformer = transformer
    @required_keys = required_keys
  end
  
  def transform(data)
    result = @transformer.transform(data)
    validate!(result)
    result
  end
  
  private
  
  def validate!(data)
    missing = @required_keys - data.keys
    raise "Missing required keys: #{missing.join(', ')}" if missing.any?
  end
end

# Compose transformers
transformer = ValidationDecorator.new(
  TypeCoercionDecorator.new(
    KeyNormalizationDecorator.new(
      BaseTransformer.new
    )
  ),
  required_keys: [:id, :name]
)

result = transformer.transform(raw_data)

Registry Pattern for Dynamic Transformers

The registry pattern maps transformation rules dynamically based on data characteristics.

class TransformerRegistry
  def initialize
    @transformers = {}
  end
  
  def register(type, transformer)
    @transformers[type] = transformer
  end
  
  def transform(data)
    type = detect_type(data)
    transformer = @transformers[type]
    raise "No transformer for type: #{type}" unless transformer
    
    transformer.call(data)
  end
  
  private
  
  def detect_type(data)
    return :user if data.key?(:email) || data.key?('email')
    return :order if data.key?(:order_id) || data.key?('orderId')
    return :product if data.key?(:sku) || data.key?('sku')
    :generic
  end
end

# Setup registry
registry = TransformerRegistry.new

registry.register(:user, ->(data) {
  {
    id: data[:id] || data['id'],
    email: (data[:email] || data['email']).downcase,
    name: data[:name] || data['name']
  }
})

registry.register(:order, ->(data) {
  {
    id: data[:order_id] || data['orderId'],
    total: data[:total] || data['total'],
    status: (data[:status] || data['status']).to_sym
  }
})

# Transform based on detected type
result = registry.transform(incoming_data)

Error Handling & Edge Cases

Data transformation encounters numerous error conditions from malformed input, missing fields, type mismatches, and encoding issues. Handling these cases determines transformation reliability.

Validation Before Transformation

Validate input structure before attempting transformation to fail fast on invalid data.

class DataValidator
  def self.validate!(data, schema)
    errors = []
    
    schema.each do |field, rules|
      value = data[field]
      
      if rules[:required] && value.nil?
        errors << "Missing required field: #{field}"
        next
      end
      
      next if value.nil? && !rules[:required]
      
      if rules[:type] && !value.is_a?(rules[:type])
        errors << "Invalid type for #{field}: expected #{rules[:type]}, got #{value.class}"
      end
      
      if rules[:format] && !value.to_s.match?(rules[:format])
        errors << "Invalid format for #{field}: #{value}"
      end
    end
    
    raise ValidationError, errors.join('; ') if errors.any?
  end
end

# Usage
schema = {
  id: { required: true, type: Integer },
  email: { required: true, format: /@/ },
  age: { type: Integer }
}

begin
  DataValidator.validate!(input_data, schema)
  result = transform(input_data)
rescue ValidationError => e
  log_error(e)
  return default_value
end

Graceful Degradation with Fallbacks

Provide fallback values when optional data is missing or malformed rather than failing completely.

def safe_transform(data)
  {
    id: extract_id(data) || generate_id,
    name: extract_name(data) || 'Unknown',
    email: normalize_email(data[:email]) || nil,
    age: parse_age(data[:age]) || 0,
    created_at: parse_timestamp(data[:created_at]) || Time.now
  }
end

def normalize_email(value)
  return nil if value.nil? || value.to_s.empty?
  email = value.to_s.strip.downcase
  email if email.match?(/@/)
end

def parse_age(value)
  Integer(value)
rescue ArgumentError, TypeError
  nil
end

def parse_timestamp(value)
  Time.parse(value.to_s)
rescue ArgumentError
  nil
end

Error Collection vs. Fail-Fast

Choose between collecting all errors for batch reporting or failing immediately on first error.

# Fail-fast approach
def transform_strict(items)
  items.map do |item|
    validate_item!(item)
    transform_item(item)
  end
end

# Error collection approach
def transform_lenient(items)
  results = []
  errors = []
  
  items.each_with_index do |item, index|
    begin
      validate_item!(item)
      results << transform_item(item)
    rescue StandardError => e
      errors << { index: index, item: item, error: e.message }
    end
  end
  
  { results: results, errors: errors, success_rate: results.size.to_f / items.size }
end

Encoding and Character Set Issues

Handle text encoding problems during transformation, particularly when reading files or external data.

def safe_read_and_transform(file_path)
  content = File.read(file_path, encoding: 'UTF-8')
  transform(content)
rescue Encoding::InvalidByteSequenceError
  # Try with different encoding
  content = File.read(file_path, encoding: 'ISO-8859-1').encode('UTF-8', invalid: :replace, undef: :replace)
  transform(content)
end

def sanitize_string(value)
  value.to_s
    .encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
    .scrub('')
end

Partial Transformation Results

Handle scenarios where some transformations succeed while others fail in batch processing.

class TransformationResult
  attr_reader :successes, :failures
  
  def initialize
    @successes = []
    @failures = []
  end
  
  def add_success(item)
    @successes << item
  end
  
  def add_failure(item, error)
    @failures << { item: item, error: error }
  end
  
  def success?
    @failures.empty?
  end
  
  def partial_success?
    @successes.any? && @failures.any?
  end
end

def batch_transform(items)
  result = TransformationResult.new
  
  items.each do |item|
    begin
      transformed = transform_item(item)
      result.add_success(transformed)
    rescue StandardError => e
      result.add_failure(item, e.message)
    end
  end
  
  result
end

Circular Reference Detection

Detect circular references in nested structures during transformation to prevent infinite recursion.

def transform_with_cycle_detection(data, visited = Set.new)
  object_id = data.object_id
  
  if visited.include?(object_id)
    raise CircularReferenceError, "Circular reference detected"
  end
  
  visited.add(object_id)
  
  case data
  when Hash
    data.transform_values { |v| transform_with_cycle_detection(v, visited.dup) }
  when Array
    data.map { |item| transform_with_cycle_detection(item, visited.dup) }
  else
    data
  end
ensure
  visited.delete(object_id)
end

Performance Considerations

Data transformation performance matters when processing large datasets, real-time streams, or high-throughput systems. Performance optimization balances speed, memory usage, and code clarity.

Lazy Evaluation for Memory Efficiency

Lazy evaluation defers computation until results are needed, processing items one at a time instead of materializing entire collections.

# Eager - loads entire dataset into memory
def eager_transform(file_path)
  File.readlines(file_path)
    .map { |line| parse_line(line) }
    .select { |record| record[:valid] }
    .map { |record| transform_record(record) }
end

# Lazy - processes line by line
def lazy_transform(file_path)
  File.foreach(file_path).lazy
    .map { |line| parse_line(line) }
    .select { |record| record[:valid] }
    .map { |record| transform_record(record) }
end

# Only processes first 1000 valid records
result = lazy_transform('large_file.txt').first(1000)

Batch Processing for Throughput

Batch processing amortizes overhead by processing multiple items together, particularly effective for database operations or API calls.

def batch_transform_and_save(items, batch_size: 1000)
  items.each_slice(batch_size) do |batch|
    transformed = batch.map { |item| transform_item(item) }
    save_batch(transformed)
  end
end

# Parallel batch processing
require 'parallel'

def parallel_batch_transform(items, batch_size: 1000)
  batches = items.each_slice(batch_size).to_a
  
  Parallel.map(batches, in_threads: 4) do |batch|
    batch.map { |item| transform_item(item) }
  end.flatten
end

Caching Expensive Operations

Cache transformation results for repeated data or expensive operations like API lookups.

class CachingTransformer
  def initialize
    @cache = {}
  end
  
  def transform(data)
    cache_key = generate_cache_key(data)
    
    @cache[cache_key] ||= expensive_transform(data)
  end
  
  private
  
  def generate_cache_key(data)
    # Use hash of data or unique identifier
    data.hash
  end
  
  def expensive_transform(data)
    # Expensive transformation logic
    sleep 0.1  # Simulating expensive operation
    transform_data(data)
  end
end

# Memoization for method-level caching
require 'memoist'

class DataTransformer
  extend Memoist
  
  def lookup_category(product_id)
    # Expensive database or API call
    CategoryAPI.fetch(product_id)
  end
  memoize :lookup_category
end

Streaming for Large Files

Stream processing handles files too large for memory by processing data in chunks.

# Memory-efficient CSV transformation
def stream_transform_csv(input_path, output_path)
  CSV.open(output_path, 'w') do |output|
    CSV.foreach(input_path, headers: true) do |row|
      transformed = transform_row(row)
      output << transformed
    end
  end
end

# Streaming JSON array processing
require 'json/stream'

def stream_transform_json(input_stream, output_stream)
  parser = JSON::Stream::Parser.new
  
  parser.key do |key|
    # Handle each key
  end
  
  parser.value do |value|
    transformed = transform_value(value)
    output_stream.puts(JSON.generate(transformed))
  end
  
  input_stream.each { |chunk| parser << chunk }
end

Algorithmic Complexity Optimization

Choose efficient algorithms and data structures for transformation operations.

# O(n²) - inefficient lookup
def transform_with_lookup_slow(items, lookup_data)
  items.map do |item|
    match = lookup_data.find { |lookup| lookup[:id] == item[:ref_id] }
    item.merge(lookup_info: match)
  end
end

# O(n) - efficient hash lookup
def transform_with_lookup_fast(items, lookup_data)
  lookup_hash = lookup_data.index_by { |lookup| lookup[:id] }
  
  items.map do |item|
    item.merge(lookup_info: lookup_hash[item[:ref_id]])
  end
end

# Benchmark comparison
require 'benchmark'

Benchmark.bmbm do |x|
  x.report("slow") { transform_with_lookup_slow(items, lookups) }
  x.report("fast") { transform_with_lookup_fast(items, lookups) }
end

Memory Profiling and Optimization

Profile memory usage to identify bottlenecks and optimize allocations.

require 'memory_profiler'

report = MemoryProfiler.report do
  transform_large_dataset(data)
end

report.pretty_print

# Reduce allocations by reusing objects
class OptimizedTransformer
  def initialize
    @buffer = String.new
    @result_buffer = []
  end
  
  def transform_batch(items)
    @result_buffer.clear
    
    items.each do |item|
      @result_buffer << transform_item(item)
    end
    
    @result_buffer.dup
  end
end

Reference

Core Transformation Methods

Method	Purpose	Example
map	Transform each element	items.map(&:upcase)
select	Filter elements	items.select {
reject	Exclude elements	items.reject(&:nil?)
reduce	Aggregate data	items.reduce(:+)
transform_keys	Transform hash keys	hash.transform_keys(&:to_sym)
transform_values	Transform hash values	hash.transform_values(&:upcase)
group_by	Group by criteria	items.group_by(&:type)
partition	Split into two groups	items.partition(&:valid?)
each_with_object	Build result object	items.each_with_object({})
flat_map	Map and flatten	items.flat_map(&:items)

Format Parsing Libraries

Format	Library	Key Methods
JSON	json (stdlib)	JSON.parse, JSON.generate
YAML	yaml (stdlib)	YAML.load, YAML.dump
CSV	csv (stdlib)	CSV.parse, CSV.foreach
XML	rexml (stdlib)	REXML::Document.new
XML	nokogiri (gem)	Nokogiri::XML, Nokogiri::HTML
MessagePack	msgpack (gem)	MessagePack.pack, MessagePack.unpack
TOML	toml-rb (gem)	TomlRB.load_file

Type Conversion Methods

Conversion	Method	Notes
String to Integer	to_i	Returns 0 for non-numeric
String to Float	to_f	Returns 0.0 for non-numeric
String to Symbol	to_sym	Creates symbol from string
Symbol to String	to_s	Converts symbol to string
String to Boolean	custom	Use comparison or case
String to Date	Date.parse	Requires date library
String to Time	Time.parse	Flexible parsing
String to Time ISO	Time.iso8601	Strict ISO format
Integer to String	to_s	Basic string conversion
Hash to JSON	to_json	Requires json library

Validation Patterns

Pattern	Implementation	Use Case
Required field	value.nil?	Ensure field exists
Type check	value.is_a?(Integer)	Verify data type
Format match	value.match?(/regex/)	Validate format
Range check	(min..max).cover?(value)	Numeric ranges
Enum check	allowed.include?(value)	Limited values
Custom validation	validator.call(value)	Complex rules

Error Handling Strategies

Strategy	Approach	Example
Fail fast	Raise on first error	validate! method
Error collection	Collect all errors	errors array
Fallback values	Provide defaults	value or default
Silent skip	Ignore invalid items	compact after map
Partial results	Return valid subset	filter successes
Retry logic	Retry on transient errors	retry 3 times

Performance Techniques

Technique	Benefit	Trade-off
Lazy evaluation	Memory efficient	Deferred computation
Batch processing	Higher throughput	Latency per item
Caching	Avoid recomputation	Memory usage
Streaming	Handle large files	Complexity
Parallel processing	Use multiple cores	Coordination overhead
Index lookups	Fast access	Build time

Common Transformation Patterns

Pattern	Purpose	Implementation
Map-Reduce	Aggregate data	map then reduce
Filter-Map	Transform subset	select then map
Flatten	Denormalize nested data	flat_map or flatten
Group-Aggregate	Summary by category	group_by then map values
Join	Combine datasets	merge or zip
Pivot	Reshape data	Custom grouping
Normalize	Flatten nested structures	Recursive traversal
Denormalize	Embed related data	Hash merging

Data Transformation