Overview
YAML (YAML Ain't Markup Language) processing in Ruby centers around the Psych
library, which became the default YAML processor in Ruby 1.9.3. The YAML
module provides the primary interface for parsing YAML strings into Ruby objects and serializing Ruby objects back to YAML format.
Ruby's YAML implementation handles scalar values (strings, numbers, booleans), sequences (arrays), and mappings (hashes). The parser automatically converts YAML types to appropriate Ruby objects: strings remain strings, integers become Integer
objects, and nested structures become Hash
and Array
objects.
The core API revolves around four primary methods: YAML.load
for parsing YAML strings into Ruby objects, YAML.dump
for converting Ruby objects to YAML strings, YAML.load_file
for reading YAML files directly, and YAML.safe_load
for security-conscious parsing with restricted object instantiation.
# Basic YAML parsing
yaml_string = "name: John\nage: 30\nskills: [Ruby, Python, JavaScript]"
data = YAML.load(yaml_string)
# => {"name"=>"John", "age"=>30, "skills"=>["Ruby", "Python", "JavaScript"]}
# Converting Ruby objects to YAML
ruby_data = { users: [{ name: "Alice", role: "admin" }, { name: "Bob", role: "user" }] }
yaml_output = YAML.dump(ruby_data)
puts yaml_output
# =>
# ---
# :users:
# - :name: Alice
# :role: admin
# - :name: Bob
# :role: user
The Psych
parser provides additional control through the Psych::Parser
and Psych::Emitter
classes for streaming operations and custom YAML processing. Ruby's YAML processing supports YAML 1.1 specification features including anchors, aliases, and multi-document streams.
Basic Usage
Loading YAML data transforms text into Ruby objects through several methods depending on the data source and security requirements. YAML.load
handles basic string parsing, while YAML.load_file
reads directly from files without manual file operations.
# Loading from strings
config_yaml = <<~YAML
database:
host: localhost
port: 5432
credentials:
username: app_user
password: secure_pass
features:
caching: true
logging: false
YAML
config = YAML.load(config_yaml)
db_host = config["database"]["host"] # => "localhost"
cache_enabled = config["features"]["caching"] # => true
File-based YAML loading eliminates the need for explicit file handling. The method automatically opens, reads, and closes files while applying the same parsing rules as string-based loading.
# config/application.yml
YAML.load_file("config/application.yml")
# Equivalent to: YAML.load(File.read("config/application.yml"))
# Loading with error handling
begin
settings = YAML.load_file("config/settings.yml")
rescue Errno::ENOENT
settings = {} # Default empty configuration
end
Generating YAML output converts Ruby data structures into formatted YAML strings. The YAML.dump
method handles nested objects, arrays, and complex data types while maintaining proper YAML syntax and indentation.
# Complex data structure conversion
application_config = {
name: "MyApp",
version: "2.1.4",
environments: {
development: {
debug: true,
database_url: "postgres://localhost/myapp_dev"
},
production: {
debug: false,
database_url: ENV["DATABASE_URL"]
}
},
supported_locales: ["en", "es", "fr", "de"]
}
yaml_config = YAML.dump(application_config)
File.write("generated_config.yml", yaml_config)
Stream processing handles multiple YAML documents within a single file or string. The YAML.load_stream
method returns an array containing each document as a separate Ruby object, enabling batch processing of related configurations.
# Multiple document processing
multi_doc_yaml = <<~YAML
---
service: web
replicas: 3
---
service: database
replicas: 1
---
service: cache
replicas: 2
YAML
services = YAML.load_stream(multi_doc_yaml)
services.each do |service_config|
puts "#{service_config['service']}: #{service_config['replicas']} replicas"
end
# => web: 3 replicas
# => database: 1 replicas
# => cache: 2 replicas
Symbol key conversion requires explicit handling since YAML typically produces string keys by default. Ruby provides several approaches for converting string keys to symbols when the application expects symbolic access patterns.
# Converting string keys to symbols
yaml_data = YAML.load("name: John\nage: 30")
# => {"name"=>"John", "age"=>30}
# Manual symbol conversion
symbolized = yaml_data.transform_keys(&:to_sym)
# => {:name=>"John", :age=>30}
# Deep symbol conversion for nested hashes
def deep_symbolize_keys(obj)
case obj
when Hash
obj.transform_keys(&:to_sym).transform_values { |v| deep_symbolize_keys(v) }
when Array
obj.map { |item| deep_symbolize_keys(item) }
else
obj
end
end
Error Handling & Debugging
YAML parsing failures generate specific exception types that indicate different categories of problems. Psych::SyntaxError
occurs when YAML syntax violates formatting rules, while Psych::DisallowedClass
indicates attempts to instantiate restricted object types during parsing.
# Syntax error handling
malformed_yaml = <<~YAML
name: John
age: 30 # Incorrect indentation
email: john@example.com
YAML
begin
data = YAML.load(malformed_yaml)
rescue Psych::SyntaxError => e
puts "YAML syntax error at line #{e.line}, column #{e.column}: #{e.problem}"
puts "Context: #{e.context}" if e.context
# Handle gracefully or provide user feedback
end
Safe loading prevents arbitrary object instantiation during YAML parsing, addressing security vulnerabilities where malicious YAML could execute code through object deserialization. The YAML.safe_load
method restricts parsing to basic Ruby types: strings, numbers, arrays, hashes, and booleans.
# Safe loading with permitted classes
potentially_unsafe_yaml = <<~YAML
created_at: 2023-12-15 10:30:00
user_data:
name: Alice
permissions: [read, write, admin]
YAML
# Unsafe - could instantiate arbitrary objects
# data = YAML.load(potentially_unsafe_yaml)
# Safe approach with explicit permitted classes
safe_data = YAML.safe_load(
potentially_unsafe_yaml,
permitted_classes: [Date, Time, Symbol],
aliases: true
)
Custom error handling strategies help applications gracefully handle missing files, network timeouts, or corrupted YAML data. Implementing fallback mechanisms prevents application crashes when configuration files become unavailable or malformed.
# Robust configuration loading with fallbacks
class ConfigurationLoader
DEFAULT_CONFIG = {
"database" => { "pool_size" => 5, "timeout" => 30 },
"cache" => { "enabled" => true, "ttl" => 3600 },
"logging" => { "level" => "info" }
}.freeze
def self.load_configuration(file_path)
YAML.load_file(file_path)
rescue Errno::ENOENT
warn "Configuration file not found: #{file_path}. Using defaults."
DEFAULT_CONFIG.dup
rescue Psych::SyntaxError => e
warn "Invalid YAML syntax in #{file_path}: #{e.message}"
DEFAULT_CONFIG.dup
rescue StandardError => e
warn "Unexpected error loading configuration: #{e.message}"
DEFAULT_CONFIG.dup
end
def self.validate_required_keys(config, required_keys)
missing_keys = required_keys - config.keys
unless missing_keys.empty?
raise ArgumentError, "Missing required configuration keys: #{missing_keys.join(', ')}"
end
config
end
end
# Usage with validation
config = ConfigurationLoader.load_configuration("config/app.yml")
validated_config = ConfigurationLoader.validate_required_keys(
config,
%w[database cache logging]
)
Debugging YAML parsing issues requires understanding the parser's state and the specific location of problems. The Psych::SyntaxError
exception provides line numbers, column positions, and contextual information about parsing failures.
# Detailed debugging approach
def debug_yaml_parsing(yaml_content, source_description = "YAML content")
begin
parsed_data = YAML.load(yaml_content)
puts "Successfully parsed #{source_description}"
parsed_data
rescue Psych::SyntaxError => e
puts "YAML parsing failed in #{source_description}:"
puts " Line #{e.line}, Column #{e.column}"
puts " Problem: #{e.problem}"
puts " Context: #{e.context}" if e.context
# Show problematic lines with context
lines = yaml_content.lines
start_line = [e.line - 3, 0].max
end_line = [e.line + 2, lines.length - 1].min
(start_line..end_line).each do |line_num|
marker = line_num == e.line - 1 ? ">>> " : " "
puts "#{marker}#{line_num + 1}: #{lines[line_num]}"
end
nil
end
end
Schema validation ensures YAML content matches expected structures before processing. While Ruby doesn't include built-in YAML schema validation, custom validation logic can verify required fields, data types, and value constraints.
# Custom YAML validation
class YAMLValidator
def self.validate_database_config(config)
errors = []
unless config.is_a?(Hash)
errors << "Configuration must be a hash/mapping"
return errors
end
required_fields = %w[host port database]
required_fields.each do |field|
unless config.key?(field)
errors << "Missing required field: #{field}"
end
end
if config["port"] && !config["port"].is_a?(Integer)
errors << "Port must be an integer"
end
if config["ssl"] && ![true, false].include?(config["ssl"])
errors << "SSL must be boolean"
end
errors
end
end
# Usage
config_yaml = "host: localhost\nport: 5432\ndatabase: myapp"
config = YAML.load(config_yaml)
validation_errors = YAMLValidator.validate_database_config(config)
if validation_errors.empty?
# Proceed with valid configuration
else
puts "Configuration errors:"
validation_errors.each { |error| puts " - #{error}" }
end
Performance & Memory
Large YAML file processing requires streaming approaches to avoid loading entire files into memory simultaneously. The Psych::Parser
class enables event-driven parsing that processes YAML content incrementally rather than building complete object trees in memory.
# Memory-efficient streaming parser
class YAMLStreamer < Psych::Handler
def initialize
@current_data = {}
@key_stack = []
@results = []
end
def start_mapping(anchor, tag, implicit, style)
@key_stack.push({})
end
def end_mapping
completed_hash = @key_stack.pop
if @key_stack.empty?
@results << completed_hash
else
# Handle nested mappings
end
end
def scalar(value, anchor, tag, plain, quoted, style)
# Process individual scalar values without building full object
process_scalar_value(value)
end
def process_scalar_value(value)
# Custom processing logic for each scalar
puts "Processing: #{value}" if value.length > 1000
end
end
# Stream processing large files
def process_large_yaml_file(file_path)
handler = YAMLStreamer.new
parser = Psych::Parser.new(handler)
File.open(file_path, 'r') do |file|
file.each_line do |line|
parser << line
end
end
handler.results
end
Memory usage optimization involves understanding how different YAML structures consume memory during parsing and generation. Arrays with many elements create significant memory overhead compared to streaming approaches that process elements individually.
# Memory comparison example
require 'benchmark'
require 'memory_profiler'
# Generate large YAML content
large_array = (1..100_000).map { |i| { id: i, name: "Item #{i}", active: i.even? } }
yaml_content = YAML.dump(large_array)
puts "YAML content size: #{yaml_content.bytesize} bytes"
# Memory profiling
report = MemoryProfiler.report do
parsed_data = YAML.load(yaml_content)
end
puts "Memory used: #{report.total_allocated_memsize} bytes"
puts "Objects created: #{report.total_allocated}"
Performance benchmarking reveals significant differences between parsing methods and content types. Complex nested structures require more processing time than flat mappings, while string-heavy content parses faster than mixed data types requiring type conversion.
# Performance benchmarking different YAML operations
require 'benchmark/ips'
simple_yaml = "name: John\nage: 30\nemail: john@example.com"
complex_yaml = <<~YAML
users:
- id: 1
profile:
name: Alice
settings:
notifications: true
theme: dark
permissions: [read, write, admin]
- id: 2
profile:
name: Bob
settings:
notifications: false
theme: light
permissions: [read]
YAML
Benchmark.ips do |x|
x.report("simple_load") { YAML.load(simple_yaml) }
x.report("complex_load") { YAML.load(complex_yaml) }
x.report("safe_load") { YAML.safe_load(simple_yaml) }
x.report("dump_simple") { YAML.dump(YAML.load(simple_yaml)) }
x.compare!
end
Caching strategies reduce repetitive parsing overhead when the same YAML content requires multiple access patterns. Implementing intelligent caching with file modification time checking prevents stale data while avoiding unnecessary parsing operations.
# YAML caching implementation
class CachedYAMLLoader
def initialize
@cache = {}
@file_times = {}
end
def load_file(file_path)
current_mtime = File.mtime(file_path)
cached_time = @file_times[file_path]
if cached_time.nil? || current_mtime > cached_time
@cache[file_path] = YAML.load_file(file_path)
@file_times[file_path] = current_mtime
end
@cache[file_path]
end
def clear_cache(file_path = nil)
if file_path
@cache.delete(file_path)
@file_times.delete(file_path)
else
@cache.clear
@file_times.clear
end
end
end
# Usage with automatic cache management
yaml_loader = CachedYAMLLoader.new
config = yaml_loader.load_file("config/settings.yml") # Loads and caches
config_again = yaml_loader.load_file("config/settings.yml") # Returns cached version
Optimization techniques for YAML generation focus on reducing object allocation and minimizing string operations during serialization. Pre-computing repetitive elements and avoiding unnecessary object creation during dump operations can significantly improve performance.
# Optimized YAML generation
class OptimizedYAMLGenerator
def initialize
@emitter = Psych::Emitter.new(StringIO.new)
end
def generate_user_list(users)
@emitter.start_stream(Psych::Parser::UTF8)
@emitter.start_document([], [], false)
@emitter.start_mapping(nil, nil, true, Psych::Nodes::Mapping::BLOCK)
@emitter.scalar("users", nil, nil, true, false, Psych::Nodes::Scalar::PLAIN)
@emitter.start_sequence(nil, nil, true, Psych::Nodes::Sequence::BLOCK)
users.each do |user|
generate_user_mapping(user)
end
@emitter.end_sequence
@emitter.end_mapping
@emitter.end_document(false)
@emitter.end_stream
@emitter.target.string
end
private
def generate_user_mapping(user)
@emitter.start_mapping(nil, nil, true, Psych::Nodes::Mapping::BLOCK)
user.each do |key, value|
@emitter.scalar(key.to_s, nil, nil, true, false, Psych::Nodes::Scalar::PLAIN)
@emitter.scalar(value.to_s, nil, nil, true, false, Psych::Nodes::Scalar::PLAIN)
end
@emitter.end_mapping
end
end
Common Pitfalls
Indentation errors represent the most frequent YAML parsing problems, particularly when mixing spaces and tabs or using inconsistent indentation levels. YAML requires precise indentation with spaces only, and mixing indentation styles causes parsing failures that can be difficult to diagnose visually.
# Common indentation problems
problematic_yaml = <<~YAML
database:
host: localhost
port: 5432 # Tab instead of spaces - will fail
credentials:
username: admin
password: secret # Inconsistent indentation - will fail
YAML
# Correct indentation - spaces only, consistent levels
correct_yaml = <<~YAML
database:
host: localhost
port: 5432
credentials:
username: admin
password: secret
YAML
# Debugging indentation issues
def detect_indentation_problems(yaml_string)
problems = []
yaml_string.lines.each_with_index do |line, index|
if line.match?(/\t/)
problems << "Line #{index + 1}: Contains tab character"
end
leading_spaces = line[/^ */].length
if leading_spaces > 0 && leading_spaces % 2 != 0
problems << "Line #{index + 1}: Odd number of leading spaces (#{leading_spaces})"
end
end
problems
end
Type coercion surprises occur when YAML automatically converts values to unexpected Ruby types. Numeric strings, boolean-like values, and date-formatted strings undergo automatic conversion that may not match application expectations.
# Unexpected type conversions
tricky_yaml = <<~YAML
version: 1.0 # Becomes Float, not String
enabled: yes # Becomes true (boolean)
disabled: no # Becomes false (boolean)
phone: 555-1234 # Remains String (contains hyphen)
zip_code: 12345 # Becomes Integer
date_string: 2023-12-15 # Becomes Date object
null_value: null # Becomes nil
empty_string: "" # Remains empty String
just_spaces: " " # Remains String with spaces
YAML
data = YAML.load(tricky_yaml)
puts data["version"].class # => Float (not String!)
puts data["enabled"].class # => TrueClass (not String!)
puts data["zip_code"].class # => Integer (not String!)
# Preventing unwanted type conversion
safe_yaml = <<~YAML
version: "1.0" # Quoted to remain String
enabled: "yes" # Quoted to remain String
zip_code: "12345" # Quoted to remain String
YAML
# Alternative: disable automatic type conversion
data = YAML.load(tricky_yaml, permitted_classes: [], aliases: false)
Symbol versus string key confusion creates difficult-to-debug issues when YAML loading produces string keys but application code expects symbol keys. This mismatch causes nil
returns when accessing hash values and can be particularly problematic in configuration files.
# Key access confusion
config_yaml = "database_host: localhost\napi_key: abc123"
config = YAML.load(config_yaml)
# This works - string keys
puts config["database_host"] # => "localhost"
# This fails silently - expecting symbol keys
puts config[:database_host] # => nil (not found)
# Hybrid approach with both access methods
class FlexibleHash < Hash
def [](key)
super(key) || super(key.to_s) || super(key.to_sym)
end
end
# Extension to handle both key types
config = YAML.load(config_yaml)
flexible_config = FlexibleHash.new.merge(config)
puts flexible_config[:database_host] # => "localhost" (works!)
puts flexible_config["database_host"] # => "localhost" (also works!)
Multi-document YAML files require special handling that differs from single-document parsing. Using YAML.load
on multi-document content only returns the first document, silently ignoring subsequent documents and potentially causing data loss.
# Multi-document pitfall
multi_doc_content = <<~YAML
---
service: web
port: 3000
---
service: database
port: 5432
---
service: cache
port: 6379
YAML
# Wrong - only loads first document
single_doc = YAML.load(multi_doc_content)
puts single_doc # => {"service"=>"web", "port"=>3000}
# Correct - loads all documents
all_docs = YAML.load_stream(multi_doc_content)
puts all_docs.length # => 3
all_docs.each { |doc| puts "#{doc['service']}: #{doc['port']}" }
Anchor and alias handling creates unexpected object sharing that can lead to unintended mutations. When YAML contains aliases referring to anchors, Ruby creates shared object references rather than independent copies.
# Dangerous object sharing through aliases
shared_yaml = <<~YAML
default_settings: &defaults
timeout: 30
retries: 3
logging: true
production:
<<: *defaults
host: prod.example.com
development:
<<: *defaults
host: dev.example.com
YAML
config = YAML.load(shared_yaml)
# Modifying one environment affects the other!
config["production"]["timeout"] = 60
puts config["development"]["timeout"] # => 60 (not 30!)
# Safe approach - deep copy shared structures
require 'deep_clone'
def safe_load_with_aliases(yaml_content)
loaded = YAML.load(yaml_content)
# Deep clone to prevent shared object mutations
loaded.transform_values do |value|
value.is_a?(Hash) ? deep_clone(value) : value
end
end
def deep_clone(obj)
case obj
when Hash
obj.transform_keys { |k| deep_clone(k) }
.transform_values { |v| deep_clone(v) }
when Array
obj.map { |item| deep_clone(item) }
else
obj.respond_to?(:dup) ? obj.dup : obj
end
end
Encoding issues arise when YAML files contain non-ASCII characters or when the file encoding doesn't match Ruby's expectations. This particularly affects applications processing YAML files created on different operating systems or containing internationalized content.
# Encoding handling
def safe_yaml_load_with_encoding(file_path)
# Try UTF-8 first, fallback to system encoding
content = begin
File.read(file_path, encoding: 'UTF-8')
rescue ArgumentError => e
if e.message.include?('invalid byte sequence')
File.read(file_path, encoding: 'ASCII-8BIT')
.force_encoding('UTF-8')
else
raise
end
end
YAML.load(content)
rescue Encoding::UndefinedConversionError
# Handle files with mixed encodings
File.open(file_path, 'r:bom|utf-8') { |f| YAML.load(f.read) }
end
Reference
Core Methods
Method | Parameters | Returns | Description |
---|---|---|---|
YAML.load(yaml) |
yaml (String) |
Object | Parses YAML string into Ruby object |
YAML.load_file(path) |
path (String/Pathname) |
Object | Loads and parses YAML file |
YAML.safe_load(yaml, **opts) |
yaml (String), options (Hash) |
Object | Secure parsing with restricted classes |
YAML.dump(object, **opts) |
object (Object), options (Hash) |
String | Converts Ruby object to YAML string |
YAML.load_stream(yaml) |
yaml (String) |
Array | Parses multi-document YAML into array |
Safe Loading Options
Option | Type | Default | Description |
---|---|---|---|
permitted_classes |
Array | [] |
Classes allowed during deserialization |
permitted_symbols |
Array | [] |
Symbols allowed during parsing |
aliases |
Boolean | false |
Whether to allow YAML aliases |
filename |
String | nil |
Filename for error reporting |
Dump Options
Option | Type | Default | Description |
---|---|---|---|
line_width |
Integer | 0 |
Maximum line width (0 = unlimited) |
indentation |
Integer | 2 |
Number of spaces for indentation |
canonical |
Boolean | false |
Use canonical YAML format |
header |
Boolean | false |
Include document header (--- ) |
Exception Hierarchy
Exception | Parent | Triggered By |
---|---|---|
Psych::SyntaxError |
StandardError | Invalid YAML syntax |
Psych::DisallowedClass |
StandardError | Restricted class instantiation |
Psych::BadAlias |
StandardError | Invalid alias reference |
Psych::AliasesNotEnabled |
StandardError | Aliases used when disabled |
YAML Data Type Mapping
YAML Value | Ruby Type | Notes |
---|---|---|
string |
String | Unquoted strings |
"quoted" |
String | Quoted strings |
123 |
Integer | Numeric literals |
1.23 |
Float | Decimal numbers |
true , yes , on |
TrueClass | Boolean true values |
false , no , off |
FalseClass | Boolean false values |
null , ~ |
NilClass | Null values |
2023-12-15 |
Date | ISO date format |
2023-12-15 10:30:00 |
Time | ISO datetime format |
Psych Parser Events
Event Method | Parameters | Purpose |
---|---|---|
start_document |
version, tag_directives, implicit | Document start |
end_document |
implicit | Document end |
start_mapping |
anchor, tag, implicit, style | Hash/mapping start |
end_mapping |
Hash/mapping end | |
start_sequence |
anchor, tag, implicit, style | Array/sequence start |
end_sequence |
Array/sequence end | |
scalar |
value, anchor, tag, plain, quoted, style | Individual value |
alias |
anchor | Alias reference |
Common YAML Patterns
Pattern | YAML Syntax | Ruby Result |
---|---|---|
Simple mapping | key: value |
{"key" => "value"} |
Nested mapping | parent:\n child: value |
{"parent" => {"child" => "value"}} |
Sequence | - item1\n- item2 |
["item1", "item2"] |
Mixed structure | items:\n - name: first |
{"items" => [{"name" => "first"}]} |
Multi-line string | text: >\n line1\n line2 |
{"text" => "line1 line2"} |
Literal string | text: |\n line1\n line2 |
{"text" => "line1\nline2"} |
Anchor/Alias | default: &def\n val: 1\nother:\n <<: *def |
Shared object reference |
File Extension Conventions
Extension | Usage | Content Type |
---|---|---|
.yml |
Standard YAML files | Configuration, data |
.yaml |
Alternative extension | Same as .yml |
.config |
Application config | YAML configuration |
.settings |
User settings | YAML preferences |