Overview
Error recovery in Ruby centers around the exception system, which provides structured ways to handle, propagate, and recover from runtime errors. Ruby's exception hierarchy starts with Exception
at the root, with StandardError
serving as the base class for most application-level exceptions that should be rescued.
The core recovery mechanisms include rescue
clauses for catching exceptions, ensure
blocks for cleanup code, retry
statements for re-attempting failed operations, and raise
for propagating or generating exceptions. Ruby's exception objects carry information about the error including the message, backtrace, and cause chain.
# Basic exception structure
begin
risky_operation
rescue StandardError => e
handle_error(e)
ensure
cleanup_resources
end
Ruby automatically rescues StandardError
and its subclasses when no specific exception type is specified. System-level exceptions like SystemExit
, SignalException
, and NoMemoryError
inherit directly from Exception
and typically should not be rescued by application code.
# Exception hierarchy demonstration
class CustomError < StandardError; end
begin
raise CustomError, "Something went wrong"
rescue StandardError => e
puts "Caught: #{e.class}: #{e.message}"
puts "Backtrace: #{e.backtrace.first}"
end
# => Caught: CustomError: Something went wrong
# => Backtrace: (irb):4:in `irb_binding'
The exception object provides access to the complete call stack through backtrace
, the original cause through cause
, and custom data through subclassing. This information forms the foundation for effective error recovery strategies.
Basic Usage
The begin/rescue/end
block structure handles exceptions that occur within the protected code section. Multiple rescue
clauses can catch different exception types, with more specific exceptions listed first.
def parse_data(input)
begin
JSON.parse(input)
rescue JSON::ParserError => e
puts "Invalid JSON format: #{e.message}"
nil
rescue StandardError => e
puts "Unexpected error: #{e.message}"
nil
end
end
result = parse_data('{"invalid": json}')
# => Invalid JSON format: unexpected token at '{"invalid": json}'
# => nil
The ensure
block executes regardless of whether an exception occurs, making it ideal for resource cleanup. File handles, network connections, and temporary resources should be cleaned up in ensure
blocks.
def process_file(filename)
file = nil
begin
file = File.open(filename, 'r')
process_content(file.read)
rescue Errno::ENOENT
puts "File not found: #{filename}"
rescue IOError => e
puts "IO error: #{e.message}"
ensure
file&.close
puts "File processing completed"
end
end
The retry
statement restarts execution from the beginning of the begin
block, allowing automatic recovery from transient failures. Use retry counters to prevent infinite loops.
def fetch_with_retry(url, max_attempts = 3)
attempts = 0
begin
attempts += 1
Net::HTTP.get_response(URI(url))
rescue Net::TimeoutError, Net::HTTPError => e
if attempts < max_attempts
puts "Attempt #{attempts} failed: #{e.message}"
sleep(attempts * 2) # exponential backoff
retry
else
puts "Failed after #{max_attempts} attempts"
raise
end
end
end
Method-level rescue clauses provide a concise way to handle exceptions for entire methods without explicit begin/end
blocks.
def calculate_average(numbers)
numbers.sum.to_f / numbers.length
rescue ZeroDivisionError
0.0
rescue NoMethodError
puts "Invalid input: expected array of numbers"
nil
end
puts calculate_average([1, 2, 3, 4, 5]) # => 3.0
puts calculate_average([]) # => 0.0
puts calculate_average(nil) # => Invalid input: expected array of numbers
# => nil
Advanced Usage
Exception chaining preserves the original error context when raising new exceptions, creating a chain of causality that aids debugging. The cause
attribute automatically captures the currently handled exception when raising a new one.
class DataProcessingError < StandardError; end
class ValidationError < StandardError; end
def process_user_data(raw_data)
begin
parsed_data = JSON.parse(raw_data)
rescue JSON::ParserError => e
raise DataProcessingError, "Failed to parse user data: invalid JSON format"
end
begin
validate_required_fields(parsed_data)
rescue ValidationError => e
raise DataProcessingError, "Data validation failed: #{e.message}"
end
parsed_data
end
def validate_required_fields(data)
raise ValidationError, "Missing required field: email" unless data['email']
raise ValidationError, "Missing required field: name" unless data['name']
end
# Exception chaining in action
begin
process_user_data('{"name": "John"}')
rescue DataProcessingError => e
puts "Main error: #{e.message}"
puts "Caused by: #{e.cause.class}: #{e.cause.message}"
end
# => Main error: Data validation failed: Missing required field: email
# => Caused by: ValidationError: Missing required field: email
Custom exception classes can carry additional context and behavior specific to application domains. Include relevant data and implement custom formatting methods.
class APIError < StandardError
attr_reader :status_code, :response_body, :request_url
def initialize(message, status_code: nil, response_body: nil, request_url: nil)
super(message)
@status_code = status_code
@response_body = response_body
@request_url = request_url
end
def to_h
{
message: message,
status_code: status_code,
response_body: response_body,
request_url: request_url,
timestamp: Time.now.iso8601
}
end
end
class RateLimitError < APIError
attr_reader :retry_after
def initialize(message, retry_after:, **opts)
super(message, **opts)
@retry_after = retry_after
end
end
# Usage with rich error context
def api_request(url)
response = make_http_request(url)
case response.code
when '429'
raise RateLimitError.new(
"Rate limit exceeded",
status_code: 429,
retry_after: response['Retry-After'].to_i,
request_url: url,
response_body: response.body
)
when '500'
raise APIError.new(
"Server error",
status_code: 500,
request_url: url,
response_body: response.body
)
end
response
end
Dynamic exception handling uses case statements and exception matching to handle different error types with varying strategies.
class ErrorHandler
def self.handle(exception, context = {})
case exception
when Net::TimeoutError, Errno::ECONNREFUSED
handle_network_error(exception, context)
when JSON::ParserError
handle_parse_error(exception, context)
when RateLimitError
handle_rate_limit(exception, context)
when APIError
handle_api_error(exception, context)
else
handle_unknown_error(exception, context)
end
end
private
def self.handle_network_error(exception, context)
{
strategy: :retry,
delay: 5,
max_attempts: 3,
error_type: :network,
message: "Network connectivity issue: #{exception.message}"
}
end
def self.handle_rate_limit(exception, context)
{
strategy: :delay_retry,
delay: exception.retry_after || 60,
max_attempts: 1,
error_type: :rate_limit,
message: "Rate limited, retry after #{exception.retry_after} seconds"
}
end
def self.handle_unknown_error(exception, context)
{
strategy: :fail,
error_type: :unknown,
message: "Unhandled error: #{exception.class}: #{exception.message}",
context: context
}
end
end
Error Handling & Debugging
Exception introspection provides detailed information for debugging and monitoring. The backtrace, cause chain, and custom error data create a complete picture of failure scenarios.
class DebugError < StandardError
attr_reader :debug_info
def initialize(message, debug_info = {})
super(message)
@debug_info = debug_info.merge(
timestamp: Time.now,
thread_id: Thread.current.object_id,
process_id: Process.pid
)
end
def detailed_message
<<~MESSAGE
#{message}
Debug Information:
#{debug_info.map { |k, v| " #{k}: #{v}" }.join("\n")}
Call Stack:
#{backtrace.first(5).map { |line| " #{line}" }.join("\n")}
MESSAGE
end
end
def risky_operation(data)
raise DebugError.new(
"Operation failed",
input_size: data.size,
input_type: data.class,
memory_usage: `ps -o rss= -p #{Process.pid}`.to_i,
environment: ENV['RAILS_ENV'] || 'development'
)
end
begin
risky_operation([1, 2, 3])
rescue DebugError => e
puts e.detailed_message
end
Structured error logging captures exception details in a format suitable for log aggregation and analysis. Include correlation IDs and request context.
require 'logger'
require 'json'
class StructuredLogger
def initialize(logger = Logger.new(STDOUT))
@logger = logger
@logger.formatter = proc do |severity, datetime, progname, msg|
JSON.generate({
timestamp: datetime.iso8601,
level: severity,
message: msg
}) + "\n"
end
end
def log_exception(exception, context = {})
error_data = {
error_class: exception.class.name,
error_message: exception.message,
backtrace: exception.backtrace&.first(10),
cause_chain: build_cause_chain(exception),
context: context
}
@logger.error(error_data)
end
private
def build_cause_chain(exception, chain = [])
return chain if exception.nil? || chain.size > 10
chain << {
class: exception.class.name,
message: exception.message,
location: exception.backtrace&.first
}
build_cause_chain(exception.cause, chain)
end
end
# Usage in error handling
logger = StructuredLogger.new
begin
complex_operation
rescue StandardError => e
logger.log_exception(e, {
user_id: current_user&.id,
request_id: request.headers['X-Request-ID'],
endpoint: "#{request.method} #{request.path}",
parameters: filtered_params
})
render json: { error: "Internal server error" }, status: 500
end
Error recovery state machines track failure patterns and adjust recovery strategies based on historical behavior.
class CircuitBreaker
attr_reader :state, :failure_count, :last_failure_time
CLOSED = :closed
OPEN = :open
HALF_OPEN = :half_open
def initialize(failure_threshold: 5, timeout: 30)
@failure_threshold = failure_threshold
@timeout = timeout
@failure_count = 0
@state = CLOSED
@last_failure_time = nil
end
def call(&block)
case state
when CLOSED
execute_and_track(&block)
when OPEN
check_timeout_and_execute(&block)
when HALF_OPEN
test_execution(&block)
end
end
private
def execute_and_track(&block)
result = block.call
reset_failures
result
rescue StandardError => e
record_failure
raise
end
def check_timeout_and_execute(&block)
if timeout_exceeded?
@state = HALF_OPEN
test_execution(&block)
else
raise CircuitBreakerOpenError, "Circuit breaker is open"
end
end
def test_execution(&block)
result = block.call
@state = CLOSED
reset_failures
result
rescue StandardError => e
@state = OPEN
record_failure
raise
end
def record_failure
@failure_count += 1
@last_failure_time = Time.now
@state = OPEN if @failure_count >= @failure_threshold
end
def reset_failures
@failure_count = 0
@last_failure_time = nil
end
def timeout_exceeded?
@last_failure_time && (Time.now - @last_failure_time) > @timeout
end
end
class CircuitBreakerOpenError < StandardError; end
Production Patterns
Production error recovery requires monitoring, alerting, and graceful degradation strategies. Implement health checks and fallback mechanisms for critical systems.
class ServiceHealthMonitor
def initialize(services)
@services = services
@health_status = {}
@alert_thresholds = {
error_rate: 0.05, # 5% error rate
response_time: 5.0, # 5 second response time
availability: 0.95 # 95% availability
}
end
def check_health
@services.each do |service_name, service|
begin
start_time = Time.now
service.health_check
response_time = Time.now - start_time
record_success(service_name, response_time)
rescue StandardError => e
record_failure(service_name, e)
if critical_failure?(service_name)
trigger_alert(service_name, e)
end
end
end
generate_health_report
end
private
def record_success(service_name, response_time)
status = @health_status[service_name] ||= {
total_requests: 0,
successful_requests: 0,
total_response_time: 0.0,
last_success: nil,
last_error: nil
}
status[:total_requests] += 1
status[:successful_requests] += 1
status[:total_response_time] += response_time
status[:last_success] = Time.now
end
def record_failure(service_name, exception)
status = @health_status[service_name] ||= {
total_requests: 0,
successful_requests: 0,
total_response_time: 0.0,
last_success: nil,
last_error: nil
}
status[:total_requests] += 1
status[:last_error] = {
exception: exception.class.name,
message: exception.message,
timestamp: Time.now
}
end
def critical_failure?(service_name)
status = @health_status[service_name]
return false unless status && status[:total_requests] > 0
error_rate = 1.0 - (status[:successful_requests].to_f / status[:total_requests])
error_rate > @alert_thresholds[:error_rate]
end
def trigger_alert(service_name, exception)
AlertSystem.send_alert(
severity: :critical,
service: service_name,
message: "Service health check failed: #{exception.message}",
error_rate: calculate_error_rate(service_name),
timestamp: Time.now
)
end
end
Graceful degradation patterns maintain service availability when dependencies fail by implementing fallback mechanisms and cached responses.
class ResilientDataService
def initialize(primary_source, cache, fallback_source = nil)
@primary_source = primary_source
@cache = cache
@fallback_source = fallback_source
@circuit_breaker = CircuitBreaker.new(failure_threshold: 3, timeout: 60)
end
def fetch_data(key, options = {})
# Try circuit-breaker protected primary source
begin
@circuit_breaker.call do
data = @primary_source.fetch(key, options)
@cache.write(key, data, expires_in: options[:cache_ttl] || 300)
data
end
rescue CircuitBreakerOpenError
fetch_degraded_data(key, options.merge(source: :circuit_breaker))
rescue StandardError => e
log_primary_failure(key, e, options)
fetch_degraded_data(key, options.merge(source: :primary_error))
end
end
private
def fetch_degraded_data(key, options)
degradation_source = options[:source]
# Try cache first
cached_data = @cache.read(key)
if cached_data
log_cache_hit(key, degradation_source)
return add_degradation_metadata(cached_data, :cache, degradation_source)
end
# Try fallback source
if @fallback_source
begin
fallback_data = @fallback_source.fetch(key, options)
@cache.write(key, fallback_data, expires_in: 60) # Short cache for fallback
log_fallback_success(key, degradation_source)
return add_degradation_metadata(fallback_data, :fallback, degradation_source)
rescue StandardError => e
log_fallback_failure(key, e, degradation_source)
end
end
# Return stale cache if available
stale_data = @cache.read(key, ignore_expiration: true)
if stale_data
log_stale_cache_usage(key, degradation_source)
return add_degradation_metadata(stale_data, :stale_cache, degradation_source)
end
# Last resort: raise service unavailable error
raise ServiceUnavailableError.new(
"All data sources unavailable for key: #{key}",
degradation_source: degradation_source,
attempted_sources: [:primary, :cache, :fallback, :stale_cache]
)
end
def add_degradation_metadata(data, source_type, degradation_reason)
if data.is_a?(Hash)
data.merge(
_metadata: {
source: source_type,
degraded: true,
degradation_reason: degradation_reason,
timestamp: Time.now.iso8601
}
)
else
data
end
end
end
class ServiceUnavailableError < StandardError
attr_reader :degradation_source, :attempted_sources
def initialize(message, degradation_source:, attempted_sources:)
super(message)
@degradation_source = degradation_source
@attempted_sources = attempted_sources
end
end
Error aggregation and reporting systems collect error metrics across application instances for trend analysis and proactive issue detection.
class ErrorAggregator
def initialize(storage_backend)
@storage = storage_backend
@aggregation_window = 300 # 5 minutes
end
def record_error(exception, context = {})
error_signature = generate_error_signature(exception)
timestamp = Time.now.to_i
window_start = (timestamp / @aggregation_window) * @aggregation_window
error_record = {
signature: error_signature,
window_start: window_start,
exception_class: exception.class.name,
message: exception.message,
location: exception.backtrace&.first,
context: context,
count: 1,
first_seen: timestamp,
last_seen: timestamp
}
@storage.increment_error(error_signature, window_start, error_record)
end
def generate_report(time_range)
errors = @storage.fetch_errors(time_range)
{
summary: {
total_errors: errors.sum { |e| e[:count] },
unique_errors: errors.size,
error_rate: calculate_error_rate(errors, time_range),
top_errors: errors.sort_by { |e| -e[:count] }.first(10)
},
trends: {
hourly_counts: group_by_hour(errors),
error_type_distribution: group_by_type(errors),
location_hotspots: group_by_location(errors)
},
alerts: generate_alert_conditions(errors)
}
end
private
def generate_error_signature(exception)
components = [
exception.class.name,
exception.message&.gsub(/\d+/, 'N')&.gsub(/['"][^'"]*['"]/, 'STRING'),
exception.backtrace&.first&.gsub(/:\d+/, ':N')
].compact
Digest::SHA256.hexdigest(components.join('|'))[0, 16]
end
def generate_alert_conditions(errors)
alerts = []
# High frequency errors
high_frequency = errors.select { |e| e[:count] > 100 }
alerts += high_frequency.map do |error|
{
type: :high_frequency,
signature: error[:signature],
count: error[:count],
message: "High frequency error: #{error[:exception_class]}"
}
end
# New error types
recent_errors = errors.select { |e| e[:first_seen] > (Time.now - 3600).to_i }
new_errors = recent_errors.select { |e| e[:count] > 5 }
alerts += new_errors.map do |error|
{
type: :new_error,
signature: error[:signature],
first_seen: Time.at(error[:first_seen]),
message: "New error detected: #{error[:exception_class]}"
}
end
alerts
end
end
Common Pitfalls
Exception handling in Ruby contains several gotchas that can lead to unexpected behavior or hidden bugs. Understanding these patterns prevents common mistakes.
Rescuing Exception
instead of StandardError
catches system-level exceptions that should typically propagate. This can prevent proper program termination and mask critical system issues.
# Problematic - catches system signals and exit attempts
begin
dangerous_operation
rescue Exception => e # DON'T DO THIS
log_error(e)
# System signals like Ctrl+C are now trapped
end
# Correct - catches application errors only
begin
dangerous_operation
rescue StandardError => e # DO THIS
log_error(e)
# SystemExit, SignalException, etc. still propagate
end
# Demonstrate the problem
begin
puts "Press Ctrl+C to interrupt"
sleep 10
rescue Exception
puts "Caught exit signal - this prevents clean shutdown"
# Process cannot be stopped normally
end
Bare rescue
clauses without explicit exception types default to rescuing StandardError
, but this behavior is implicit and can confuse readers. Always specify the exception type explicitly.
# Implicit and unclear
begin
risky_operation
rescue => e # Implicitly rescues StandardError
handle_error(e)
end
# Explicit and clear
begin
risky_operation
rescue StandardError => e # Explicitly rescues StandardError
handle_error(e)
end
The retry
statement without limiting conditions creates infinite loops when exceptions persist. Always implement retry limits and backoff strategies.
# Dangerous - infinite retry loop
def unreliable_network_call
attempts = 0
begin
make_http_request
rescue Net::TimeoutError
puts "Retrying..."
retry # Will retry forever if timeouts persist
end
end
# Safe - limited retries with backoff
def reliable_network_call
max_attempts = 3
attempts = 0
begin
attempts += 1
make_http_request
rescue Net::TimeoutError => e
if attempts < max_attempts
backoff_time = attempts * 2
puts "Attempt #{attempts} failed, retrying in #{backoff_time} seconds"
sleep(backoff_time)
retry
else
puts "Failed after #{max_attempts} attempts"
raise NetworkError, "Request failed after #{max_attempts} attempts: #{e.message}"
end
end
end
Exception swallowing occurs when rescue blocks handle errors but don't provide meaningful recovery or reporting. This hides problems and makes debugging difficult.
# Problematic - swallows exceptions silently
def process_items(items)
items.map do |item|
begin
expensive_processing(item)
rescue StandardError
nil # Problem: error information is lost
end
end.compact
end
# Better - logs errors and provides context
def process_items(items)
results = []
errors = []
items.each_with_index do |item, index|
begin
results << expensive_processing(item)
rescue StandardError => e
error_context = {
item_index: index,
item_data: item.inspect,
error: e.message,
backtrace: e.backtrace.first(3)
}
errors << error_context
logger.error("Item processing failed: #{error_context}")
# Decide whether to continue or fail fast
raise ProcessingError, "Too many failures" if errors.size > items.size * 0.1
end
end
{
results: results,
errors: errors,
success_rate: results.size.to_f / items.size
}
end
Resource cleanup in ensure blocks can fail if the resource is in an unexpected state. Always check resource state before cleanup operations.
# Problematic - assumes file handle is valid
def process_file(filename)
file = File.open(filename, 'r')
begin
process_content(file.read)
ensure
file.close # May raise if file is already closed or nil
end
end
# Safer - checks resource state before cleanup
def process_file(filename)
file = nil
begin
file = File.open(filename, 'r')
process_content(file.read)
rescue Errno::ENOENT => e
puts "File not found: #{filename}"
rescue IOError => e
puts "File IO error: #{e.message}"
ensure
# Safe cleanup with state checking
if file && !file.closed?
begin
file.close
rescue IOError
# Already closed or invalid - ignore
end
end
end
end
Custom exception inheritance from Exception
instead of StandardError
prevents standard rescue clauses from catching application-specific errors.
# Wrong inheritance hierarchy
class MyCustomError < Exception; end # DON'T DO THIS
begin
raise MyCustomError, "Something wrong"
rescue StandardError => e
puts "Won't catch MyCustomError"
end
# MyCustomError propagates uncaught
# Correct inheritance hierarchy
class MyCustomError < StandardError; end # DO THIS
begin
raise MyCustomError, "Something wrong"
rescue StandardError => e
puts "Correctly catches MyCustomError: #{e.message}"
end
# => Correctly catches MyCustomError: Something wrong
Reference
Core Exception Classes
Class | Inheritance | Description |
---|---|---|
Exception |
Object | Root of exception hierarchy |
SystemExit |
Exception | Raised by exit calls |
SignalException |
Exception | System signal interrupts |
Interrupt |
SignalException | User interrupt (Ctrl+C) |
StandardError |
Exception | Base for application exceptions |
RuntimeError |
StandardError | Default for raise without class |
NoMethodError |
NameError | Undefined method called |
ArgumentError |
StandardError | Wrong number/type of arguments |
TypeError |
StandardError | Type conversion errors |
SystemCallError |
StandardError | System call failures |
Exception Methods
Method | Returns | Description |
---|---|---|
#message |
String | Exception message |
#backtrace |
Array | Call stack trace |
#backtrace_locations |
ArrayThread::Backtrace::Location | Structured backtrace |
#cause |
Exception or nil | Previous exception in chain |
#full_message |
String | Formatted message with backtrace |
#set_backtrace(bt) |
Array | Set custom backtrace |
#to_s |
String | String representation |
#inspect |
String | Detailed string representation |
Rescue Syntax Patterns
Pattern | Scope | Usage |
---|---|---|
begin...rescue...end |
Block | Explicit exception handling |
method rescue value |
Inline | Single-line rescue with return value |
def method; ...; rescue; ...; end |
Method | Method-level exception handling |
rescue Type => var |
Variable binding | Capture exception instance |
rescue Type1, Type2 |
Multiple types | Handle multiple exception types |
rescue |
Bare rescue | Catches StandardError (implicit) |
Error Recovery Strategies
Strategy | Use Case | Implementation Pattern |
---|---|---|
Retry with Backoff | Transient failures | retry with attempt counting and delays |
Circuit Breaker | Cascading failures | State machine tracking failure rates |
Graceful Degradation | Service dependencies | Fallback to cache, alternate sources |
Fail Fast | Invalid input | Early validation and immediate failure |
Bulkhead Pattern | Resource isolation | Separate error handling per resource type |
Timeout Pattern | Hanging operations | Time-bounded operations with cleanup |
Exception Handling Anti-patterns
Anti-pattern | Problem | Solution |
---|---|---|
rescue Exception |
Catches system exceptions | Use rescue StandardError |
Bare retry |
Infinite retry loops | Add attempt limits and backoff |
Silent error swallowing | Lost debugging information | Log errors with context |
Generic error messages | Poor debugging experience | Include relevant context data |
Resource leaks | Files/connections not closed | Use ensure blocks for cleanup |
Exception inheritance from Exception |
Bypasses standard rescue | Inherit from StandardError |
Error Context Data
Context Type | Include | Example |
---|---|---|
Request Context | User ID, request ID, endpoint | { user_id: 123, request_id: "abc-123", path: "/api/users" } |
System Context | Process ID, thread ID, memory usage | { pid: 12345, thread: 67890, memory_mb: 256 } |
Application Context | Feature flags, environment, version | { env: "production", version: "1.2.3", feature_x: true } |
Error Context | Input data, operation step, retry count | { input_size: 1000, step: "validation", attempt: 2 } |
Structured Error Logging Format
{
timestamp: "2025-09-01T10:30:00Z",
level: "ERROR",
logger: "MyApp::UserService",
message: "User validation failed",
error: {
class: "ValidationError",
message: "Email format invalid",
backtrace: ["app/services/user_service.rb:42:in `validate_email'"],
cause_chain: [
{ class: "RegexpError", message: "Invalid pattern", location: "..." }
]
},
context: {
user_id: 12345,
request_id: "req-abc123",
endpoint: "POST /users",
input_data: { email: "invalid-email" }
}
}