CrackedRuby logo

CrackedRuby

Object Serialization

Converting Ruby objects to and from various data formats for storage, transmission, and interoperability.

Core Modules Marshal Module
3.7.1

Overview

Object serialization transforms Ruby objects into portable formats that can be stored, transmitted, or reconstructed later. Ruby provides multiple serialization mechanisms, each optimized for different scenarios and compatibility requirements.

The Marshal module handles Ruby's native binary serialization format, preserving object structure and Ruby-specific data types with maximum fidelity. The JSON module converts objects to JavaScript Object Notation, prioritizing cross-language compatibility and human readability. The YAML module serializes to YAML format, balancing readability with support for complex data structures.

user = { name: "Alice", age: 30, roles: ["admin", "user"] }

# Native Ruby serialization
marshal_data = Marshal.dump(user)
# => "\x04\b{\bI\"\tname\x06:\x06EFI\"\nAlice\x06;\x00F..."

# JSON serialization  
json_data = JSON.dump(user)
# => "{\"name\":\"Alice\",\"age\":30,\"roles\":[\"admin\",\"user\"]}"

# YAML serialization
yaml_data = YAML.dump(user)
# => "---\n:name: Alice\n:age: 30\n:roles:\n- admin\n- user\n"

Each format handles object reconstruction through corresponding load methods. Marshal.load reconstructs the exact Ruby object structure, while JSON.parse and YAML.load create new objects with equivalent data.

Marshal.load(marshal_data)
# => {:name=>"Alice", :age=>30, :roles=>["admin", "user"]}

JSON.parse(json_data)
# => {"name"=>"Alice", "age"=>30, "roles"=>["admin", "user"]}

YAML.load(yaml_data)
# => {:name=>"Alice", :age=>30, :roles=>["admin", "user"]}

Custom objects require additional considerations. Marshal preserves class information and instance variables automatically. JSON and YAML require explicit conversion logic through to_json and to_yaml methods, or custom serialization handlers.

Basic Usage

Marshal provides the most complete serialization for Ruby objects, handling complex data structures, custom classes, and maintaining object relationships including circular references.

class User
  attr_accessor :name, :email, :created_at
  
  def initialize(name, email)
    @name = name
    @email = email
    @created_at = Time.now
  end
end

user = User.new("Bob", "bob@example.com")
user_data = Marshal.dump(user)
restored_user = Marshal.load(user_data)

restored_user.name
# => "Bob"
restored_user.created_at.class
# => Time

JSON serialization requires converting objects to Hash representations. The JSON module automatically handles basic Ruby types like strings, numbers, arrays, and hashes.

# Hash serialization
data = { users: [{ id: 1, active: true }, { id: 2, active: false }] }
json_string = JSON.generate(data)
# => "{\"users\":[{\"id\":1,\"active\":true},{\"id\":2,\"active\":false}]}"

parsed_data = JSON.parse(json_string)
# => {"users"=>[{"id"=>1, "active"=>true}, {"id"=>2, "active"=>false}]}

# Array serialization
numbers = [1, 2.5, -10, 1000]
JSON.generate(numbers)
# => "[1,2.5,-10,1000]"

Custom objects need explicit JSON conversion logic. Define to_json methods or use the object's hash representation.

class Product
  attr_accessor :name, :price, :category
  
  def initialize(name, price, category)
    @name = name
    @price = price
    @category = category
  end
  
  def to_json(*args)
    {
      name: @name,
      price: @price,
      category: @category
    }.to_json(*args)
  end
end

product = Product.new("Laptop", 999.99, "Electronics")
JSON.generate(product)
# => "{\"name\":\"Laptop\",\"price\":999.99,\"category\":\"Electronics\"}"

YAML handles more complex Ruby data types than JSON, including symbols, dates, and ranges. The syntax remains human-readable while supporting nested structures.

config = {
  database: {
    host: "localhost",
    port: 5432,
    credentials: {
      username: :admin,
      password: "secret123"
    }
  },
  features: {
    enabled: true,
    beta_range: (1..10),
    timeout: 30
  }
}

yaml_output = YAML.dump(config)
puts yaml_output
# ---
# :database:
#   :host: localhost
#   :port: 5432
#   :credentials:
#     :username: :admin
#     :password: secret123
# :features:
#   :enabled: true
#   :beta_range: 1..10
#   :timeout: 30

YAML.load(yaml_output)[:features][:beta_range]
# => 1..10

File-based serialization patterns handle persistence scenarios. Each format provides convenient file I/O methods.

# Marshal file operations
File.open("user_data.marshal", "wb") { |f| Marshal.dump(user, f) }
loaded_user = File.open("user_data.marshal", "rb") { |f| Marshal.load(f) }

# JSON file operations  
File.write("config.json", JSON.pretty_generate(data))
loaded_data = JSON.parse(File.read("config.json"))

# YAML file operations
File.write("settings.yml", YAML.dump(config))
loaded_config = YAML.load_file("settings.yml")

Error Handling & Debugging

Serialization operations encounter various error conditions that require specific handling strategies. Marshal operations can fail due to unsupported objects, corrupted data, or class loading issues.

# Handling unsupported objects
proc_object = proc { puts "Hello" }

begin
  Marshal.dump(proc_object)
rescue TypeError => e
  puts "Cannot serialize: #{e.message}"
  # => Cannot serialize: no _dump_data is defined for class Proc
end

# Handling corrupted marshal data
corrupted_data = "invalid marshal data"

begin
  Marshal.load(corrupted_data)
rescue ArgumentError => e
  puts "Marshal error: #{e.message}"
  # => Marshal error: marshal data too short
end

JSON parsing errors occur with malformed data, unsupported types, or encoding issues. The parser provides detailed error information including position data.

# Handling malformed JSON
malformed_json = '{"name": "Alice", "age":}'

begin
  JSON.parse(malformed_json)
rescue JSON::ParserError => e
  puts "JSON parsing failed: #{e.message}"
  # => JSON parsing failed: unexpected token at '}'
  puts "Error occurred around position: #{e.to_s.scan(/\d+/).first}"
end

# Handling encoding issues
binary_data = "\x80\x81\x82".force_encoding("UTF-8")

begin
  JSON.generate({ data: binary_data })
rescue Encoding::UndefinedConversionError => e
  puts "Encoding error: #{e.message}"
  # Handle by encoding to base64 or cleaning data
  safe_data = { data: [binary_data].pack('m0') }  # base64 encoding
  JSON.generate(safe_data)
end

YAML deserialization presents security risks when loading untrusted data. Ruby provides safe loading options to prevent code execution vulnerabilities.

# Unsafe YAML with embedded Ruby code
dangerous_yaml = <<~YAML
  ---
  - !ruby/object:OpenStruct 
    table:
      :name: "Alice"
      :command: !ruby/object:Kernel
YAML

# Safe loading approach
begin
  safe_data = YAML.safe_load(dangerous_yaml, permitted_classes: [])
rescue Psych::DisallowedClass => e
  puts "Blocked unsafe class: #{e.message}"
  # => Blocked unsafe class: Tried to load unspecified class: OpenStruct
end

# Permitted classes for controlled deserialization
permitted_classes = [Date, Time, Symbol]
safe_config = YAML.safe_load(yaml_string, permitted_classes: permitted_classes)

Circular reference detection prevents infinite loops during serialization. Marshal handles circular references automatically, while JSON and YAML require manual detection.

# Marshal handles circular references
parent = { name: "Parent" }
child = { name: "Child", parent: parent }
parent[:child] = child

marshal_data = Marshal.dump(parent)  # Works fine
restored = Marshal.load(marshal_data)
restored[:child][:parent] == restored  # => true

# JSON requires circular reference detection
begin
  JSON.generate(parent)
rescue JSON::NestingError => e
  puts "Circular reference detected: #{e.message}"
  
  # Manual handling approach
  def serialize_with_refs(obj, refs = Set.new)
    case obj
    when Hash
      if refs.include?(obj.object_id)
        return { "__circular_ref__" => obj.object_id }
      end
      refs.add(obj.object_id)
      result = obj.transform_values { |v| serialize_with_refs(v, refs) }
      refs.delete(obj.object_id)
      result
    else
      obj
    end
  end
  
  safe_parent = serialize_with_refs(parent)
  JSON.generate(safe_parent)
end

Version compatibility issues arise when deserializing data created with different Ruby versions or gem versions. Implement version checking and migration strategies.

# Version-aware serialization
module VersionedSerialization
  VERSION = "1.2"
  
  def self.dump(obj)
    versioned_data = {
      version: VERSION,
      data: obj,
      timestamp: Time.now.iso8601
    }
    Marshal.dump(versioned_data)
  end
  
  def self.load(data)
    begin
      container = Marshal.load(data)
      
      if container[:version] != VERSION
        puts "Version mismatch: #{container[:version]} vs #{VERSION}"
        # Apply migration logic here
        migrate_data(container[:data], container[:version])
      else
        container[:data]
      end
    rescue => e
      puts "Failed to load versioned data: #{e.message}"
      nil
    end
  end
  
  private
  
  def self.migrate_data(data, from_version)
    case from_version
    when "1.1"
      # Apply 1.1 to 1.2 migration
      migrate_1_1_to_1_2(data)
    when "1.0"
      # Apply 1.0 to 1.2 migration  
      migrate_1_0_to_1_2(data)
    else
      raise "Unknown version: #{from_version}"
    end
  end
end

Performance & Memory

Serialization performance varies significantly between formats, with trade-offs between speed, size, and compatibility. Marshal provides the fastest serialization for Ruby-to-Ruby communication, while JSON offers better cross-platform performance.

require 'benchmark'

# Test data: complex nested structure
data = {
  users: (1..1000).map do |i|
    {
      id: i,
      name: "User #{i}",
      email: "user#{i}@example.com",
      metadata: {
        created_at: Time.now - rand(365) * 24 * 3600,
        preferences: {
          theme: ["light", "dark"].sample,
          notifications: rand > 0.5,
          features: (1..rand(10)).map { |f| "feature_#{f}" }
        }
      }
    }
  end
}

# Benchmark serialization speed
Benchmark.bm(15) do |x|
  x.report("Marshal dump:") { 100.times { Marshal.dump(data) } }
  x.report("JSON generate:") { 100.times { JSON.generate(data) } }
  x.report("YAML dump:") { 100.times { YAML.dump(data) } }
end

# Typical results (times vary by system):
#                      user     system      total        real
# Marshal dump:    0.050000   0.000000   0.050000 (  0.052341)
# JSON generate:   0.180000   0.010000   0.190000 (  0.191250)  
# YAML dump:       2.340000   0.020000   2.360000 (  2.387453)

Size efficiency impacts storage requirements and network transmission times. Marshal produces compact binary data, JSON creates readable but larger text, and YAML generates the most verbose output.

marshal_size = Marshal.dump(data).bytesize
json_size = JSON.generate(data).bytesize  
yaml_size = YAML.dump(data).bytesize

puts "Marshal: #{marshal_size} bytes"
puts "JSON: #{json_size} bytes (#{json_size.to_f / marshal_size:.1f}x larger)"
puts "YAML: #{yaml_size} bytes (#{yaml_size.to_f / marshal_size:.1f}x larger)"

# Example output:
# Marshal: 89432 bytes
# JSON: 142851 bytes (1.6x larger)
# YAML: 198347 bytes (2.2x larger)

Memory usage patterns differ during serialization and deserialization. Large objects can cause memory spikes, particularly with YAML processing.

# Memory monitoring during serialization
def measure_memory
  GC.start
  GC.disable
  memory_before = `ps -o rss= -p #{Process.pid}`.to_i
  yield
  memory_after = `ps -o rss= -p #{Process.pid}`.to_i
  GC.enable
  memory_after - memory_before
end

large_array = (1..100_000).map { |i| { id: i, data: "x" * 100 } }

marshal_memory = measure_memory { Marshal.dump(large_array) }
json_memory = measure_memory { JSON.generate(large_array) }
yaml_memory = measure_memory { YAML.dump(large_array) }

puts "Memory usage (KB):"
puts "Marshal: #{marshal_memory}"
puts "JSON: #{json_memory}"  
puts "YAML: #{yaml_memory}"

Streaming serialization prevents memory exhaustion when processing large datasets. Implement custom streaming for JSON arrays and YAML documents.

# Streaming JSON array serialization
class JSONStreamer
  def initialize(io)
    @io = io
    @first = true
  end
  
  def start_array
    @io.write("[")
  end
  
  def write_object(obj)
    @io.write(",") unless @first
    @io.write(JSON.generate(obj))
    @first = false
  end
  
  def end_array
    @io.write("]")
  end
end

# Usage for large dataset
File.open("large_data.json", "w") do |file|
  streamer = JSONStreamer.new(file)
  streamer.start_array
  
  (1..1_000_000).each do |i|
    record = { id: i, timestamp: Time.now.to_i }
    streamer.write_object(record)
    
    # Process in batches to control memory
    GC.start if i % 10_000 == 0
  end
  
  streamer.end_array
end

Object pooling and reuse strategies reduce garbage collection pressure during high-frequency serialization operations.

class SerializationPool
  def initialize
    @json_parsers = []
    @marshal_buffers = []
  end
  
  def with_json_parser
    parser = @json_parsers.pop || JSON
    begin
      yield parser
    ensure
      @json_parsers.push(parser) if @json_parsers.length < 10
    end
  end
  
  def with_marshal_buffer
    buffer = @marshal_buffers.pop || StringIO.new
    buffer.rewind
    buffer.truncate(0)
    
    begin
      yield buffer
    ensure
      @marshal_buffers.push(buffer) if @marshal_buffers.length < 10
    end
  end
end

# High-frequency serialization with pooling
pool = SerializationPool.new

1000.times do |i|
  data = { request_id: i, payload: "data_#{i}" }
  
  pool.with_marshal_buffer do |buffer|
    Marshal.dump(data, buffer)
    serialized = buffer.string
    # Process serialized data
  end
end

Production Patterns

Web API serialization requires consistent data formatting, error handling, and performance optimization. JSON dominates API responses due to broad client support and reasonable performance characteristics.

# Rails API serialization pattern
class UserSerializer
  def self.serialize(user, options = {})
    base_data = {
      id: user.id,
      name: user.name,
      email: user.email,
      created_at: user.created_at.iso8601
    }
    
    if options[:include_roles]
      base_data[:roles] = user.roles.map(&:name)
    end
    
    if options[:include_preferences]
      base_data[:preferences] = serialize_preferences(user.preferences)
    end
    
    base_data
  end
  
  def self.serialize_collection(users, options = {})
    {
      data: users.map { |user| serialize(user, options) },
      meta: {
        total: users.respond_to?(:total_count) ? users.total_count : users.size,
        serialized_at: Time.current.iso8601
      }
    }
  end
  
  private
  
  def self.serialize_preferences(preferences)
    return nil unless preferences
    
    {
      theme: preferences[:theme] || "default",
      notifications: {
        email: preferences.dig(:notifications, :email) != false,
        push: preferences.dig(:notifications, :push) != false
      },
      privacy: {
        profile_visible: preferences.dig(:privacy, :profile_visible) != false
      }
    }
  end
end

# Controller usage with error handling
class Api::UsersController < ApplicationController
  def index
    users = User.includes(:roles).page(params[:page])
    
    render json: UserSerializer.serialize_collection(
      users,
      include_roles: params[:include_roles],
      include_preferences: params[:include_preferences]
    )
  rescue => e
    render json: {
      error: "Serialization failed",
      message: e.message
    }, status: 500
  end
end

Caching strategies optimize repeated serialization operations. Implement cache invalidation based on object changes and serialization options.

class CachedSerializer
  CACHE_TTL = 1.hour
  
  def self.serialize_with_cache(object, options = {})
    cache_key = generate_cache_key(object, options)
    
    Rails.cache.fetch(cache_key, expires_in: CACHE_TTL) do
      perform_serialization(object, options)
    end
  end
  
  def self.invalidate_cache(object)
    # Clear all cached versions for this object
    pattern = "serialized:#{object.class.name}:#{object.id}:*"
    Rails.cache.delete_matched(pattern)
  end
  
  private
  
  def self.generate_cache_key(object, options)
    option_hash = Digest::MD5.hexdigest(options.to_json)
    timestamp = object.respond_to?(:updated_at) ? object.updated_at.to_i : Time.current.to_i
    
    "serialized:#{object.class.name}:#{object.id}:#{timestamp}:#{option_hash}"
  end
  
  def self.perform_serialization(object, options)
    # Actual serialization logic here
    case object
    when User
      UserSerializer.serialize(object, options)
    when Product  
      ProductSerializer.serialize(object, options)
    else
      raise "Unknown object type: #{object.class}"
    end
  end
end

# Model integration for automatic cache invalidation
class User < ActiveRecord::Base
  after_update :invalidate_serialization_cache
  after_destroy :invalidate_serialization_cache
  
  private
  
  def invalidate_serialization_cache
    CachedSerializer.invalidate_cache(self)
  end
end

Configuration management uses YAML for environment-specific settings with validation and type coercion.

class ConfigManager
  CONFIG_PATH = Rails.root.join("config", "application.yml")
  
  def self.load_config
    raw_config = YAML.load_file(CONFIG_PATH)
    environment_config = raw_config[Rails.env] || {}
    
    validate_config(environment_config)
    coerce_types(environment_config)
  rescue Psych::SyntaxError => e
    raise "Invalid YAML configuration: #{e.message}"
  rescue Errno::ENOENT
    raise "Configuration file not found: #{CONFIG_PATH}"
  end
  
  def self.validate_config(config)
    required_keys = %w[database_url redis_url secret_key_base]
    
    missing_keys = required_keys - config.keys
    if missing_keys.any?
      raise "Missing required configuration: #{missing_keys.join(', ')}"
    end
    
    # Validate specific formats
    unless config["database_url"].start_with?("postgres://", "postgresql://")
      raise "Invalid database_url format"
    end
  end
  
  def self.coerce_types(config)
    # Convert string values to appropriate types
    config["worker_threads"] = config["worker_threads"].to_i if config["worker_threads"]
    config["enable_ssl"] = config["enable_ssl"] == "true" if config.key?("enable_ssl")
    config["timeout"] = config["timeout"].to_f if config["timeout"]
    
    # Parse complex nested values
    if config["feature_flags"].is_a?(String)
      config["feature_flags"] = config["feature_flags"].split(",").map(&:strip)
    end
    
    config
  end
end

# Application initialization
begin
  APP_CONFIG = ConfigManager.load_config
rescue => e
  puts "Configuration error: #{e.message}"
  exit 1
end

Background job serialization requires handling complex data structures and maintaining job queue compatibility.

# Sidekiq-compatible job serialization
class DataProcessingJob
  include Sidekiq::Worker
  
  def perform(serialized_data, options = {})
    # Deserialize complex data structures
    data = JSON.parse(serialized_data)
    
    case data["type"]
    when "user_export"
      process_user_export(data["user_ids"], options)
    when "report_generation"
      generate_report(data["report_config"], options)
    else
      raise "Unknown job type: #{data['type']}"
    end
  end
  
  def self.enqueue_user_export(user_ids, options = {})
    job_data = {
      type: "user_export",
      user_ids: user_ids,
      timestamp: Time.current.iso8601
    }
    
    perform_async(JSON.generate(job_data), options)
  end
  
  def self.enqueue_report_generation(report_config, options = {})
    # Sanitize config for serialization
    safe_config = sanitize_report_config(report_config)
    
    job_data = {
      type: "report_generation", 
      report_config: safe_config,
      timestamp: Time.current.iso8601
    }
    
    perform_async(JSON.generate(job_data), options)
  end
  
  private
  
  def self.sanitize_report_config(config)
    # Remove non-serializable elements
    config.except(:callbacks, :lambdas).tap do |safe_config|
      # Convert dates to ISO strings
      safe_config["start_date"] = config["start_date"].iso8601 if config["start_date"].respond_to?(:iso8601)
      safe_config["end_date"] = config["end_date"].iso8601 if config["end_date"].respond_to?(:iso8601)
    end
  end
end

Common Pitfalls

Symbol and string key inconsistencies create subtle bugs when switching between serialization formats. JSON converts symbol keys to strings, while YAML preserves symbols.

original_data = { name: "Alice", :age => 30, "email" => "alice@example.com" }

# JSON converts all keys to strings
json_round_trip = JSON.parse(JSON.generate(original_data))
# => {"name"=>"Alice", "age"=>30, "email"=>"alice@example.com"}

# YAML preserves symbol keys
yaml_round_trip = YAML.load(YAML.dump(original_data))  
# => {:name=>"Alice", :age=>30, "email"=>"alice@example.com"}

# Accessing data fails due to key type changes
json_round_trip[:name]  # => nil (key is now string)
json_round_trip["name"] # => "Alice"

# Solution: normalize keys consistently
def normalize_keys(obj, symbolize: false)
  case obj
  when Hash
    method = symbolize ? :to_sym : :to_s
    obj.each_with_object({}) do |(key, value), result|
      result[key.send(method)] = normalize_keys(value, symbolize: symbolize)
    end
  when Array
    obj.map { |item| normalize_keys(item, symbolize: symbolize) }
  else
    obj
  end
end

consistent_data = normalize_keys(json_round_trip, symbolize: true)
consistent_data[:name] # => "Alice"

Time zone and date handling varies between formats, leading to data corruption during round-trips. Marshal preserves exact Time objects, while JSON loses time zone information.

# Time zone data loss in JSON
original_time = Time.new(2024, 1, 15, 14, 30, 0, "-05:00")  # EST
puts "Original: #{original_time} (#{original_time.zone})"

# JSON loses time zone information
json_time = JSON.parse({ timestamp: original_time }.to_json)["timestamp"]
parsed_time = Time.parse(json_time)
puts "After JSON: #{parsed_time} (#{parsed_time.zone})"
# Time zone changed to system default

# Solution: explicit ISO 8601 formatting
safe_json_data = { timestamp: original_time.iso8601 }
json_string = JSON.generate(safe_json_data)
restored_data = JSON.parse(json_string)
restored_time = Time.iso8601(restored_data["timestamp"])
puts "ISO 8601 restored: #{restored_time} (#{restored_time.zone})"

# YAML preserves Time objects but may have compatibility issues
yaml_data = YAML.dump({ timestamp: original_time })
yaml_restored = YAML.load(yaml_data)
puts "YAML restored: #{yaml_restored[:timestamp]} (#{yaml_restored[:timestamp].zone})"

Encoding issues cause serialization failures, particularly with binary data or non-UTF-8 strings. Each format handles encoding differently.

# Binary data in different formats
binary_data = "\xFF\xFE\x00\x01".b  # Binary string
text_with_encoding = "Café".encode("ISO-8859-1")

begin
  # JSON fails with binary data
  JSON.generate({ binary: binary_data })
rescue Encoding::UndefinedConversionError => e
  puts "JSON encoding error: #{e.message}"
  
  # Solution: Base64 encoding for binary data
  require 'base64'
  json_safe = JSON.generate({ 
    binary: Base64.strict_encode64(binary_data),
    encoding: "base64"
  })
  
  # Decoding
  parsed = JSON.parse(json_safe)
  if parsed["encoding"] == "base64"
    restored_binary = Base64.strict_decode64(parsed["binary"])
    restored_binary == binary_data # => true
  end
end

# Marshal handles encodings naturally
marshal_data = Marshal.dump({ 
  binary: binary_data, 
  text: text_with_encoding 
})
restored = Marshal.load(marshal_data)
restored[:binary].encoding.name    # => "ASCII-8BIT"
restored[:text].encoding.name      # => "ISO-8859-1"

# YAML encoding behavior varies by content
yaml_binary = YAML.dump({ binary: binary_data })
# May produce different results on different systems

Class loading dependencies create runtime errors when deserializing objects whose classes are not available. This commonly affects Marshal data.

# Define a class for serialization
class CustomData
  attr_accessor :value, :metadata
  
  def initialize(value, metadata = {})
    @value = value
    @metadata = metadata
  end
end

custom_obj = CustomData.new("test", { created: Time.now })
marshal_data = Marshal.dump(custom_obj)

# Simulate class not being available
Object.send(:remove_const, :CustomData)

begin
  Marshal.load(marshal_data)
rescue ArgumentError => e
  puts "Class loading error: #{e.message}"
  # => undefined class/module CustomData
end

# Solution: graceful handling with fallback
module SafeDeserialization
  def self.load_marshal(data)
    Marshal.load(data)
  rescue ArgumentError => e
    if e.message.include?("undefined class")
      puts "Warning: #{e.message}"
      # Return metadata about the object instead
      { 
        error: "class_not_found",
        message: e.message,
        data_size: data.bytesize 
      }
    else
      raise
    end
  end
end

Recursive data structures cause stack overflow errors or infinite loops during serialization. Implement depth limiting and circular reference detection.

# Create problematic recursive structure
def create_recursive_hash(depth = 1000)
  current = { level: depth }
  (depth - 1).downto(1) do |i|
    current = { level: i, child: current }
  end
  current
end

deep_structure = create_recursive_hash(5000)

# Stack overflow with JSON
begin
  JSON.generate(deep_structure)
rescue SystemStackError
  puts "Stack overflow during JSON serialization"
end

# Solution: depth-limited serialization
class SafeSerializer
  MAX_DEPTH = 100
  
  def self.serialize(obj, max_depth: MAX_DEPTH)
    serialize_recursive(obj, 0, max_depth)
  end
  
  private
  
  def self.serialize_recursive(obj, current_depth, max_depth)
    if current_depth >= max_depth
      return { __truncated: true, type: obj.class.name }
    end
    
    case obj
    when Hash
      obj.each_with_object({}) do |(key, value), result|
        result[key] = serialize_recursive(value, current_depth + 1, max_depth)
      end
    when Array
      obj.map { |item| serialize_recursive(item, current_depth + 1, max_depth) }
    else
      obj
    end
  end
end

safe_data = SafeSerializer.serialize(deep_structure, max_depth: 50)
JSON.generate(safe_data)  # Works without stack overflow

Version compatibility breaks deserialization when Ruby versions or gem versions change. Marshal format changes between Ruby versions can make data unreadable.

# Version-specific serialization wrapper
class CompatibleSerializer
  RUBY_VERSION_MAP = {
    "2.7" => :ruby_27,
    "3.0" => :ruby_30,
    "3.1" => :ruby_31,
    "3.2" => :ruby_32
  }.freeze
  
  def self.dump(obj)
    metadata = {
      ruby_version: RUBY_VERSION,
      marshal_version: Marshal::MAJOR_VERSION.to_s + "." + Marshal::MINOR_VERSION.to_s,
      timestamp: Time.now.to_i,
      serializer_version: "1.0"
    }
    
    Marshal.dump([metadata, obj])
  end
  
  def self.load(data)
    begin
      metadata, obj = Marshal.load(data)
      
      if metadata[:ruby_version] != RUBY_VERSION
        puts "Warning: Data serialized with Ruby #{metadata[:ruby_version]}, " \
             "loading with Ruby #{RUBY_VERSION}"
      end
      
      obj
    rescue TypeError, ArgumentError => e
      # Attempt fallback strategies
      load_with_fallback(data, e)
    end
  end
  
  private
  
  def self.load_with_fallback(data, original_error)
    # Try loading as raw Marshal data (older format)
    begin
      Marshal.load(data)
    rescue
      # Try JSON if Marshal fails completely
      begin
        JSON.parse(data.force_encoding("UTF-8"))
      rescue
        raise original_error
      end
    end
  end
end

Reference

Marshal Module

Method Parameters Returns Description
Marshal.dump(obj, port=nil) obj (Object), port (IO, optional) String or writes to IO Serializes object to binary format
Marshal.load(source, proc=nil) source (String/IO), proc (Proc, optional) Object Deserializes binary data to object
Marshal.restore(source) source (String/IO) Object Alias for Marshal.load

Marshal Constants:

  • Marshal::MAJOR_VERSION - Major version number (4)
  • Marshal::MINOR_VERSION - Minor version number (8 in Ruby 3.2)

Serializable Types: All basic Ruby types, custom classes with instance variables, modules, constants, global variables (with restrictions)

Non-serializable Types: Proc, Method, UnboundMethod, IO, File, Dir, singleton objects

JSON Module

Method Parameters Returns Description
JSON.generate(obj, opts={}) obj (Object), opts (Hash) String Converts object to JSON string
JSON.dump(obj, io=nil, limit=nil) obj (Object), io (IO), limit (Integer) String or writes to IO Serializes with recursion limit
JSON.parse(source, opts={}) source (String), opts (Hash) Object Parses JSON string to Ruby object
JSON.load(source, proc=nil, opts={}) source (String/IO), proc (Proc), opts (Hash) Object Loads JSON with optional processing
JSON.pretty_generate(obj, opts={}) obj (Object), opts (Hash) String Formatted JSON output

JSON Generation Options:

  • :max_nesting - Maximum nesting depth (default 100)
  • :allow_nan - Allow NaN and Infinity values
  • :indent - Indentation string for pretty printing
  • :space - Space after colon and comma
  • :object_nl - Newline after objects
  • :array_nl - Newline after arrays

JSON Parsing Options:

  • :symbolize_names - Convert keys to symbols
  • :create_additions - Enable JSON additions
  • :object_class - Class to create for JSON objects (default Hash)
  • :array_class - Class to create for JSON arrays (default Array)

YAML Module

Method Parameters Returns Description
YAML.dump(obj, io=nil) obj (Object), io (IO, optional) String or writes to IO Serializes object to YAML
YAML.load(yaml, filename=nil) yaml (String), filename (String, optional) Object Deserializes YAML to object
YAML.safe_load(yaml, permitted_classes: [], aliases: false) yaml (String), options (Hash) Object Safe YAML loading with restrictions
YAML.load_file(filename) filename (String) Object Loads YAML from file
YAML.dump_stream(*objects) *objects (Array) String Multiple documents in one stream

YAML Safe Loading Options:

  • :permitted_classes - Array of allowed classes
  • :permitted_symbols - Array of allowed symbols
  • :aliases - Allow aliases (default false)
  • :filename - Filename for error reporting

YAML-specific Types: Symbols, Ranges, Regular expressions, Complex numbers, Rational numbers, Sets, custom tagged types

Error Classes

Exception Module Description
TypeError Marshal Unsupported object type for serialization
ArgumentError Marshal Invalid marshal data or format
JSON::ParserError JSON Malformed JSON syntax
JSON::NestingError JSON Maximum nesting depth exceeded
JSON::GeneratorError JSON Object cannot be converted to JSON
Psych::SyntaxError YAML Invalid YAML syntax
Psych::DisallowedClass YAML Class not permitted in safe loading
Psych::BadAlias YAML Invalid alias reference

Performance Characteristics

Format Serialization Speed Size Efficiency Cross-platform Human Readable
Marshal Fastest Most compact Ruby only No
JSON Moderate Moderate Universal Yes
YAML Slowest Least compact Wide support Yes

Type Mapping

Ruby Type Marshal JSON YAML
String Preserved with encoding UTF-8 string String with encoding
Symbol Preserved Converted to string Preserved
Integer All sizes preserved Number (limited range) Integer
Float Preserved Number Float
Array Preserved Array Sequence
Hash Preserved Object Mapping
Time Preserved with timezone ISO 8601 string Timestamp
Date Preserved String representation Date
Range Preserved Not supported Range
Regexp Preserved Not supported Regular expression
NilClass Preserved null null
TrueClass/FalseClass Preserved true/false true/false