Overview
Custom marshaling in Ruby allows objects to define their own serialization and deserialization behavior through the marshal_dump
and marshal_load
methods. Ruby's Marshal module handles the core serialization process, but objects can override the default behavior by implementing these methods to control exactly what data gets serialized and how objects get reconstructed.
The Marshal module serializes Ruby objects into a binary format that can be stored or transmitted, then reconstructed later. By default, Marshal serializes all instance variables, but custom marshaling provides fine-grained control over this process.
class BankAccount
def initialize(number, balance)
@account_number = number
@balance = balance
@created_at = Time.now
end
def marshal_dump
[@account_number, @balance]
end
def marshal_load(data)
@account_number, @balance = data
@created_at = Time.now
end
end
account = BankAccount.new("12345", 1000.0)
serialized = Marshal.dump(account)
restored = Marshal.load(serialized)
Custom marshaling becomes essential when objects contain non-serializable data like file handles, database connections, or complex nested structures that need special handling. Objects can also use marshaling to implement versioning strategies or optimize serialization performance.
Ruby calls marshal_dump
during serialization and expects it to return a serializable object representing the essential state. During deserialization, Ruby creates an uninitialized instance and calls marshal_load
with the dumped data, allowing the object to restore its state.
Basic Usage
Implementing custom marshaling requires defining both marshal_dump
and marshal_load
methods. The marshal_dump
method runs during Marshal.dump
and should return any serializable Ruby object. The marshal_load
method receives this dumped data and reconstructs the object's state.
class Configuration
def initialize(settings = {})
@settings = settings
@computed_cache = {}
@file_handles = {}
end
def get(key)
@computed_cache[key] ||= expensive_computation(@settings[key])
end
def marshal_dump
@settings
end
def marshal_load(settings)
@settings = settings
@computed_cache = {}
@file_handles = {}
end
private
def expensive_computation(value)
# Simulate expensive operation
value.to_s.upcase
end
end
The Marshal module automatically handles the object creation process. When loading, Ruby allocates the object without calling initialize
, then immediately calls marshal_load
with the dumped data. This means marshal_load
must fully initialize the object state.
For objects that need to serialize complex nested data, marshal_dump
can return arrays, hashes, or any combination of serializable objects:
class DocumentStore
def initialize
@documents = {}
@metadata = {}
@indexes = {}
end
def add_document(id, content, tags = [])
@documents[id] = content
@metadata[id] = { tags: tags, created: Time.now }
rebuild_indexes
end
def marshal_dump
{
documents: @documents,
metadata: @metadata.transform_values { |meta| meta.dup },
version: "1.0"
}
end
def marshal_load(data)
@documents = data[:documents] || {}
@metadata = data[:metadata] || {}
@indexes = {}
# Handle version migration
if data[:version] != "1.0"
migrate_from_version(data[:version])
end
rebuild_indexes
end
private
def rebuild_indexes
@indexes = @metadata.each_with_object({}) do |(id, meta), idx|
meta[:tags].each { |tag| (idx[tag] ||= []) << id }
end
end
def migrate_from_version(version)
# Handle older data formats
end
end
Objects can also serialize references to other marshalable objects. Ruby handles object references automatically, maintaining identity relationships during the marshal/unmarshal cycle:
class Node
attr_accessor :value, :children, :parent
def initialize(value)
@value = value
@children = []
@parent = nil
end
def add_child(child)
child.parent = self
@children << child
end
def marshal_dump
[@value, @children]
end
def marshal_load(data)
@value, @children = data
@children.each { |child| child.parent = self }
end
end
root = Node.new("root")
child1 = Node.new("child1")
child2 = Node.new("child2")
root.add_child(child1)
root.add_child(child2)
# Parent-child relationships preserved through marshaling
restored_tree = Marshal.load(Marshal.dump(root))
Performance & Memory
Custom marshaling provides significant opportunities for performance optimization, particularly when dealing with large objects or objects containing computed data. The choice of what to serialize directly impacts both marshaling speed and the size of the serialized output.
Excluding computed or cached data from marshaling reduces both serialization time and memory usage:
class DataProcessor
def initialize(raw_data)
@raw_data = raw_data
@processed_cache = nil
@statistics = nil
@temp_files = []
end
def processed_data
@processed_cache ||= expensive_processing(@raw_data)
end
def statistics
@statistics ||= calculate_statistics(processed_data)
end
def marshal_dump
# Only serialize the essential raw data
# Computed caches will be rebuilt on demand
@raw_data
end
def marshal_load(raw_data)
@raw_data = raw_data
@processed_cache = nil
@statistics = nil
@temp_files = []
end
private
def expensive_processing(data)
# Simulate expensive computation
data.map(&:to_f).sort.reverse
end
def calculate_statistics(data)
{
mean: data.sum / data.size,
max: data.max,
min: data.min
}
end
end
For objects with large amounts of data, custom marshaling can implement compression or alternative encoding strategies:
require 'zlib'
class CompressedData
def initialize(data)
@raw_data = data
end
def marshal_dump
# Compress data before serialization
compressed = Zlib::Deflate.deflate(@raw_data.to_s)
{
compressed_data: compressed,
original_size: @raw_data.to_s.bytesize,
compression_ratio: compressed.bytesize.to_f / @raw_data.to_s.bytesize
}
end
def marshal_load(dump_data)
# Decompress during restoration
decompressed = Zlib::Inflate.inflate(dump_data[:compressed_data])
@raw_data = eval(decompressed) # In practice, use safer deserialization
@original_size = dump_data[:original_size]
@compression_ratio = dump_data[:compression_ratio]
end
def compression_info
"Original: #{@original_size} bytes, Ratio: #{@compression_ratio}"
end
end
Memory usage optimization becomes critical when marshaling large object graphs. Custom marshaling can implement strategies to break large structures into smaller, independently marshaled pieces:
class PartitionedDataset
def initialize
@partitions = {}
@partition_size = 1000
@current_partition = 0
end
def add_record(record)
partition_key = (@partitions.size * @partition_size +
current_partition_size) / @partition_size
(@partitions[partition_key] ||= []) << record
end
def marshal_dump
# Serialize partition structure, not actual data
{
partition_keys: @partitions.keys,
partition_size: @partition_size,
total_records: total_record_count
}
end
def marshal_load(data)
@partitions = {}
@partition_size = data[:partition_size]
@current_partition = 0
# Mark partitions as available for lazy loading
data[:partition_keys].each do |key|
@partitions[key] = :lazy_load_pending
end
end
def get_partition(key)
return @partitions[key] unless @partitions[key] == :lazy_load_pending
# Implement lazy loading of partition data
@partitions[key] = load_partition_from_storage(key)
end
private
def current_partition_size
@partitions[@current_partition]&.size || 0
end
def total_record_count
@partitions.values.sum(&:size)
end
def load_partition_from_storage(key)
# Implementation would load from external storage
[]
end
end
Performance profiling reveals that custom marshaling overhead primarily comes from method calls rather than data copying. Objects with deeply nested custom marshaling can benefit from flattened serialization formats:
class OptimizedTree
def initialize(root_value = nil)
@root = root_value ? TreeNode.new(root_value) : nil
end
def marshal_dump
return nil unless @root
# Flatten tree to array format for efficient serialization
nodes = []
stack = [[@root, nil]] # [node, parent_index]
while stack.any?
node, parent_idx = stack.pop
current_idx = nodes.size
nodes << [node.value, parent_idx]
node.children.reverse_each do |child|
stack << [child, current_idx]
end
end
nodes
end
def marshal_load(nodes)
return unless nodes
# Rebuild tree from flattened format
node_objects = nodes.map { |value, _| TreeNode.new(value) }
nodes.each_with_index do |(_, parent_idx), idx|
if parent_idx
node_objects[parent_idx].add_child(node_objects[idx])
else
@root = node_objects[idx]
end
end
end
class TreeNode
attr_reader :value, :children
def initialize(value)
@value = value
@children = []
end
def add_child(child)
@children << child
end
end
end
Error Handling & Debugging
Custom marshaling introduces several categories of errors that require specific handling strategies. The most common issues involve unmarshallable objects, version compatibility problems, and corruption of the marshaled data stream.
Ruby raises TypeError
when marshal_dump
returns objects that cannot be marshaled. This commonly occurs with Proc objects, singleton objects, or objects containing file handles:
class ServiceConnection
def initialize(api_key)
@api_key = api_key
@connection = build_connection
@request_proc = proc { |data| format_request(data) }
end
def marshal_dump
# Attempt to serialize proc raises TypeError
begin
[@api_key, @connection, @request_proc]
rescue TypeError => e
# Handle unmarshallable objects gracefully
Rails.logger.warn("Skipping unmarshallable connection data: #{e.message}")
[@api_key, nil, nil]
end
end
def marshal_load(data)
@api_key, connection_data, proc_data = data
# Rebuild non-serializable resources
@connection = connection_data || build_connection
@request_proc = proc_data || proc { |data| format_request(data) }
# Validate restored state
validate_connection_state
end
private
def build_connection
# Create HTTP connection or similar
OpenStruct.new(api_key: @api_key, status: :connected)
end
def format_request(data)
data.to_json
end
def validate_connection_state
unless @connection && @api_key
raise StandardError, "Invalid connection state after unmarshaling"
end
end
end
Version compatibility errors occur when the structure of marshaled data changes between application versions. Robust marshaling implementations include version handling and migration logic:
class VersionedModel
CURRENT_VERSION = 3
def initialize(name, data = {})
@name = name
@data = data
@metadata = { created_at: Time.now }
@version = CURRENT_VERSION
end
def marshal_dump
{
version: CURRENT_VERSION,
name: @name,
data: @data,
metadata: @metadata
}
end
def marshal_load(dumped_data)
dumped_version = dumped_data[:version] || 1
case dumped_version
when 1
migrate_from_v1(dumped_data)
when 2
migrate_from_v2(dumped_data)
when CURRENT_VERSION
@name = dumped_data[:name]
@data = dumped_data[:data]
@metadata = dumped_data[:metadata]
else
handle_unknown_version(dumped_version, dumped_data)
end
@version = CURRENT_VERSION
end
private
def migrate_from_v1(data)
# v1 stored everything in a single hash
@name = data[:name]
@data = data.reject { |k, _| [:name, :version].include?(k) }
@metadata = { created_at: Time.now, migrated_from: 1 }
end
def migrate_from_v2(data)
# v2 separated data but had different metadata structure
@name = data[:name]
@data = data[:data]
@metadata = {
created_at: data[:metadata][:timestamp] || Time.now,
migrated_from: 2
}
end
def handle_unknown_version(version, data)
raise StandardError, "Cannot unmarshal version #{version} (current: #{CURRENT_VERSION})"
end
end
Circular reference handling requires careful consideration in custom marshaling. Ruby's Marshal handles object identity automatically, but custom marshaling logic must avoid infinite recursion:
class CircularSafeNode
attr_accessor :name, :connections
def initialize(name)
@name = name
@connections = []
end
def connect_to(other_node)
@connections << other_node unless @connections.include?(other_node)
other_node.connections << self unless other_node.connections.include?(self)
end
def marshal_dump
# Store connections by name to avoid circular serialization
connection_names = @connections.map(&:name)
[@name, connection_names]
end
def marshal_load(data)
@name, @connection_names = data
@connections = []
# Connections will be rebuilt after all nodes are loaded
# This requires coordination from the containing object
end
def resolve_connections(node_registry)
@connection_names.each do |name|
connected_node = node_registry[name]
@connections << connected_node if connected_node
end
@connection_names = nil
end
end
class NodeGraph
def initialize
@nodes = {}
end
def add_node(node)
@nodes[node.name] = node
end
def marshal_dump
@nodes.values.map { |node| Marshal.dump(node) }
end
def marshal_load(serialized_nodes)
@nodes = {}
# First pass: recreate all nodes
serialized_nodes.each do |serialized_node|
node = Marshal.load(serialized_node)
@nodes[node.name] = node
end
# Second pass: resolve connections
@nodes.each_value { |node| node.resolve_connections(@nodes) }
end
end
Data corruption detection becomes essential for production systems. Custom marshaling can implement checksums and validation to detect corrupted serialized data:
require 'digest'
class ChecksummedData
def initialize(payload)
@payload = payload
@checksum = nil
end
def marshal_dump
serialized_payload = Marshal.dump(@payload)
checksum = Digest::SHA256.hexdigest(serialized_payload)
{
payload: serialized_payload,
checksum: checksum,
timestamp: Time.now.to_f
}
end
def marshal_load(data)
serialized_payload = data[:payload]
expected_checksum = data[:checksum]
timestamp = data[:timestamp]
# Verify data integrity
actual_checksum = Digest::SHA256.hexdigest(serialized_payload)
if actual_checksum != expected_checksum
raise DataCorruptionError,
"Checksum mismatch: expected #{expected_checksum}, got #{actual_checksum}"
end
# Check for stale data
if Time.now.to_f - timestamp > MAX_AGE_SECONDS
Rails.logger.warn("Loading stale marshaled data from #{Time.at(timestamp)}")
end
@payload = Marshal.load(serialized_payload)
@checksum = expected_checksum
end
class DataCorruptionError < StandardError; end
MAX_AGE_SECONDS = 86400 # 24 hours
end
Common Pitfalls
Custom marshaling contains numerous subtle pitfalls that can lead to data loss, memory leaks, or application crashes. The most dangerous pitfall involves forgetting that marshal_load
runs on an uninitialized object, bypassing the normal initialize
method.
class DatabaseModel
def initialize(id, connection_pool)
@id = id
@connection_pool = connection_pool
@attributes = load_attributes_from_db
@callbacks = setup_callbacks
end
# INCORRECT: Assumes object was properly initialized
def marshal_dump
[@id, @attributes]
end
# INCORRECT: Doesn't fully restore object state
def marshal_load(data)
@id, @attributes = data
# Missing: @connection_pool and @callbacks are nil!
end
# CORRECT: Fully initialize object state
def marshal_load_correct(data)
@id, @attributes = data
@connection_pool = ConnectionPool.current
@callbacks = setup_callbacks
validate_loaded_state
end
private
def load_attributes_from_db
{ name: "Example", status: "active" }
end
def setup_callbacks
{ before_save: proc { puts "Saving..." } }
end
def validate_loaded_state
raise "Invalid state" unless @connection_pool && @callbacks
end
end
Singleton objects create particularly insidious marshaling problems because Ruby creates multiple instances of what should be singleton objects:
class ConfigurationSingleton
include Singleton
def initialize
@settings = load_from_file
@observers = []
end
# INCORRECT: Breaks singleton pattern
def marshal_dump
[@settings, @observers]
end
def marshal_load(data)
@settings, @observers = data
end
# CORRECT: Preserve singleton identity
def marshal_dump_correct
[@settings]
end
def marshal_load_correct(data)
settings, = data
# Update existing singleton rather than replacing state
instance.update_settings(settings)
end
def update_settings(new_settings)
@settings = new_settings
notify_observers
end
private
def load_from_file
{ debug: false, timeout: 30 }
end
def notify_observers
@observers.each(&:call)
end
end
Class and module objects require special handling because they are not automatically marshaled correctly in all Ruby versions:
class PolymorphicContainer
def initialize
@items = []
end
def add_item(item)
@items << {
object: item,
class_name: item.class.name,
modules: item.class.included_modules.map(&:name)
}
end
# INCORRECT: Classes might not marshal properly
def marshal_dump_incorrect
@items
end
# CORRECT: Store class names as strings
def marshal_dump
@items.map do |item_data|
{
object_data: item_data[:object].marshal_dump,
class_name: item_data[:class_name],
modules: item_data[:modules]
}
end
end
def marshal_load(data)
@items = data.map do |item_data|
klass = Object.const_get(item_data[:class_name])
object = klass.allocate
object.marshal_load(item_data[:object_data])
{
object: object,
class_name: item_data[:class_name],
modules: item_data[:modules]
}
end
rescue NameError => e
raise MarshalingError, "Cannot restore class #{item_data[:class_name]}: #{e.message}"
end
class MarshalingError < StandardError; end
end
File handles and IO objects cannot be marshaled, but developers often forget to exclude them from custom marshaling logic:
class LogProcessor
def initialize(log_file_path)
@log_file_path = log_file_path
@file_handle = nil
@buffer = []
@stats = { lines_processed: 0 }
end
def process_line(line)
ensure_file_open
@file_handle.puts(processed_line(line))
@stats[:lines_processed] += 1
end
# INCORRECT: Attempts to serialize file handle
def marshal_dump_incorrect
[@log_file_path, @file_handle, @buffer, @stats]
end
# CORRECT: Excludes non-serializable file handle
def marshal_dump
# Close file handle before serialization
close_file if @file_handle
[@log_file_path, @buffer, @stats]
end
def marshal_load(data)
@log_file_path, @buffer, @stats = data
@file_handle = nil # Will be reopened when needed
end
private
def ensure_file_open
return if @file_handle && !@file_handle.closed?
@file_handle = File.open(@log_file_path, 'a')
end
def close_file
@file_handle.close if @file_handle && !@file_handle.closed?
@file_handle = nil
end
def processed_line(line)
"#{Time.now.iso8601}: #{line.strip}"
end
end
Thread-local variables and thread-specific state create marshaling problems because the unmarshaled object runs in a different thread context:
class ThreadAwareProcessor
def initialize
@worker_id = Thread.current.object_id
@thread_local_cache = {}
@shared_data = {}
end
def process(data)
current_worker = Thread.current.object_id
if current_worker != @worker_id
handle_thread_migration
end
get_thread_cache[data.hash] ||= expensive_computation(data)
end
# INCORRECT: Thread-specific data doesn't transfer
def marshal_dump_incorrect
[@worker_id, @thread_local_cache, @shared_data]
end
# CORRECT: Only serialize thread-safe data
def marshal_dump
[@shared_data]
end
def marshal_load(data)
@shared_data, = data
@worker_id = Thread.current.object_id
@thread_local_cache = {}
initialize_thread_local_state
end
private
def handle_thread_migration
# Clear thread-specific state when moving to different thread
@thread_local_cache.clear
@worker_id = Thread.current.object_id
initialize_thread_local_state
end
def get_thread_cache
Thread.current[:processor_cache] ||= {}
end
def initialize_thread_local_state
Thread.current[:processor_cache] = {}
end
def expensive_computation(data)
# Simulate expensive work
data.to_s.chars.sum(&:ord)
end
end
Reference
Core Marshaling Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#marshal_dump |
none | Object |
Returns serializable representation of object state |
#marshal_load(data) |
data (Object) |
self |
Restores object state from marshaled data |
Marshal.dump(obj) |
obj (Object) |
String |
Serializes object to binary string |
Marshal.load(data) |
data (String) |
Object |
Deserializes object from binary string |
Marshal Module Constants
Constant | Value | Description |
---|---|---|
Marshal::MAJOR_VERSION |
4 |
Major version of marshal format |
Marshal::MINOR_VERSION |
8 |
Minor version of marshal format |
Common Marshal Exceptions
Exception | Trigger | Resolution |
---|---|---|
TypeError |
Unmarshallable object in dump data | Implement custom marshaling or exclude object |
ArgumentError |
Corrupted or invalid marshal data | Validate data integrity, implement version handling |
NameError |
Missing class during load | Ensure class availability, handle missing classes |
Object State During Marshaling
Phase | Object State | Available Methods | Notes |
---|---|---|---|
marshal_dump call |
Fully initialized | All instance methods | Object in normal state |
marshal_load call |
Uninitialized allocation | No instance variables set | initialize was not called |
After marshal_load |
Restored state | Depends on marshal_load implementation |
Must manually initialize all required state |
Marshaling Compatibility Matrix
Ruby Object Type | Default Marshal | Custom Marshal Required | Notes |
---|---|---|---|
Basic objects (String, Numeric, Array, Hash) | ✓ | ✗ | Automatically handled |
Custom classes | ✓ | Optional | All instance variables serialized |
Objects with Proc/lambda | ✗ | ✓ | Procs cannot be marshaled |
File/IO objects | ✗ | ✓ | File handles not transferable |
Singleton objects | ⚠️ | ✓ | May break singleton pattern |
Thread objects | ✗ | ✓ | Threads not transferable |
Class/Module objects | ⚠️ | ✓ | Version-dependent behavior |
Performance Considerations
Scenario | Optimization Strategy | Impact |
---|---|---|
Large object graphs | Exclude computed/cached data | 50-90% size reduction |
Deep nesting | Flatten to array structure | 30-70% speed improvement |
Frequent marshaling | Cache serialized forms | 80-95% repeated marshal time savings |
Memory constraints | Implement lazy loading | 60-90% memory usage reduction |
Version Migration Patterns
# Standard version handling template
def marshal_load(data)
version = data[:version] || 1
case version
when 1
migrate_from_v1(data)
when CURRENT_VERSION
restore_current_version(data)
else
handle_unsupported_version(version, data)
end
end
Debugging Marshal Issues
Problem | Diagnostic Method | Solution Pattern |
---|---|---|
TypeError during dump | Inspect marshal_dump return value |
Filter unmarshallable objects |
Missing state after load | Compare pre/post marshal object state | Initialize all required instance variables |
Performance issues | Profile marshal size vs. time | Optimize serialized data structure |
Version conflicts | Log version information during load | Implement migration methods |
Security Considerations
Risk | Mitigation | Implementation |
---|---|---|
Code injection via eval |
Validate unmarshaled data | Use safe parsing methods |
Resource exhaustion | Limit marshaled data size | Implement size checks |
Class pollution | Whitelist allowed classes | Validate class names before const_get |
Stale data attacks | Timestamp marshaled data | Check data age during unmarshal |