Overview
Ruby provides two essential libraries for structured data validation and standardized date-time operations: xmlschema for XML Schema Definition (XSD) validation and iso8601 for parsing and formatting ISO 8601 date-time strings.
The xmlschema library validates XML documents against XSD schemas, ensuring data conformity and structural integrity. Ruby implements schema validation through the Xmlschema::Schema
class, which parses XSD files and validates XML documents. The library supports complex schema features including type restrictions, element constraints, and namespace handling.
The iso8601 library handles ISO 8601 date-time string parsing and formatting operations. Ruby processes ISO 8601 strings through the ISO8601
module, which converts standardized date-time representations into Ruby Time
and Date
objects. The library manages time zones, durations, intervals, and various ISO 8601 formats.
require 'xmlschema'
require 'iso8601'
# XML schema validation
schema = Xmlschema::Schema.from_uri('schema.xsd')
schema.valid?('<root><element>value</element></root>')
# ISO 8601 parsing
datetime = ISO8601::DateTime.new('2024-01-15T14:30:00Z')
datetime.to_time
# => 2024-01-15 14:30:00 UTC
Both libraries handle data validation and transformation tasks that applications encounter when processing external data sources, API responses, and configuration files.
Basic Usage
XML schema validation begins with loading a schema definition from a file or URI. The Xmlschema::Schema.from_uri
method parses XSD files and creates validation objects.
require 'xmlschema'
# Load schema from file
schema = Xmlschema::Schema.from_uri('products.xsd')
# Validate XML string
xml_data = <<~XML
<product>
<id>123</id>
<name>Widget</name>
<price>29.99</price>
</product>
XML
result = schema.valid?(xml_data)
# => true or false
Schema validation processes XML documents and returns boolean results. The valid?
method performs complete schema validation including element structure, data types, and constraints.
# Detailed validation with error information
begin
schema.validate(xml_data)
puts "Valid XML document"
rescue Xmlschema::SchemaError => e
puts "Validation failed: #{e.message}"
puts "Error location: #{e.location}"
end
ISO 8601 parsing handles various standardized date-time formats. The ISO8601::DateTime.new
constructor accepts ISO 8601 strings and creates Ruby objects for manipulation.
require 'iso8601'
# Parse basic date-time
dt = ISO8601::DateTime.new('2024-03-15T10:30:00')
dt.year # => 2024
dt.month # => 3
dt.hour # => 10
# Parse with timezone
dt_tz = ISO8601::DateTime.new('2024-03-15T10:30:00+02:00')
dt_tz.zone # => "+02:00"
dt_tz.to_time.utc
# => 2024-03-15 08:30:00 UTC
Duration parsing extracts time periods from ISO 8601 duration strings. The ISO8601::Duration
class handles period calculations and conversions.
# Parse duration
duration = ISO8601::Duration.new('P1Y2M3DT4H5M6S')
duration.years # => 1
duration.months # => 2
duration.days # => 3
duration.hours # => 4
duration.minutes # => 5
duration.seconds # => 6
# Convert to seconds
duration.to_seconds # => 36993906 (approximate)
Date-only and time-only parsing handles partial ISO 8601 representations. The library converts these to appropriate Ruby objects.
# Date-only parsing
date = ISO8601::Date.new('2024-03-15')
date.to_date # => #<Date: 2024-03-15>
# Time-only parsing
time = ISO8601::Time.new('14:30:00')
time.hour # => 14
time.minute # => 30
Error Handling & Debugging
XML schema validation generates specific exceptions for different validation failure types. The Xmlschema::SchemaError
hierarchy provides detailed error information including element paths and constraint violations.
require 'xmlschema'
schema = Xmlschema::Schema.from_uri('strict_schema.xsd')
invalid_xml = <<~XML
<product>
<id>not_a_number</id>
<missing_required_field/>
</product>
XML
begin
schema.validate(invalid_xml)
rescue Xmlschema::SchemaError => e
# Detailed error analysis
puts "Error type: #{e.class}"
puts "Message: #{e.message}"
puts "Element path: #{e.path}" if e.respond_to?(:path)
puts "Expected type: #{e.expected_type}" if e.respond_to?(:expected_type)
puts "Actual value: #{e.actual_value}" if e.respond_to?(:actual_value)
end
Schema loading errors occur when XSD files contain syntax errors or reference unavailable resources. The library raises Xmlschema::SchemaParseError
for schema definition problems.
begin
schema = Xmlschema::Schema.from_uri('malformed_schema.xsd')
rescue Xmlschema::SchemaParseError => e
puts "Schema parsing failed: #{e.message}"
puts "Line number: #{e.line_number}" if e.respond_to?(:line_number)
puts "Column: #{e.column}" if e.respond_to?(:column)
rescue Errno::ENOENT
puts "Schema file not found"
rescue URI::InvalidURIError
puts "Invalid schema URI"
end
Namespace validation failures require careful error handling since XML documents may contain multiple namespace declarations. Schema validation tracks namespace context during validation.
xml_with_namespaces = <<~XML
<root xmlns="http://example.com/ns1" xmlns:ns2="http://example.com/ns2">
<element>value</element>
<ns2:other_element>other_value</ns2:other_element>
</root>
XML
begin
schema.validate(xml_with_namespaces)
rescue Xmlschema::NamespaceError => e
puts "Namespace validation failed"
puts "Expected namespace: #{e.expected_namespace}"
puts "Actual namespace: #{e.actual_namespace}"
puts "Element name: #{e.element_name}"
end
ISO 8601 parsing errors occur when strings deviate from standard formats. The ISO8601::Errors::UnknownPattern
exception indicates format recognition failures.
require 'iso8601'
invalid_formats = [
'2024-13-45', # Invalid month/day
'25:99:99', # Invalid time
'P1Y2M3DT25H', # Invalid hour in duration
'2024-02-30', # Invalid date (Feb 30)
'T10:30:00' # Missing date component
]
invalid_formats.each do |format|
begin
ISO8601::DateTime.new(format)
rescue ISO8601::Errors::UnknownPattern => e
puts "Invalid format '#{format}': #{e.message}"
rescue ISO8601::Errors::RangeError => e
puts "Value out of range '#{format}': #{e.message}"
end
end
Duration parsing handles edge cases where duration components exceed normal ranges or contain conflicting information.
problematic_durations = [
'P400D', # Large day count
'PT25H', # Hours exceeding 24
'P1Y13M', # Months exceeding 12
'P-1Y' # Negative periods
]
problematic_durations.each do |duration_str|
begin
duration = ISO8601::Duration.new(duration_str)
puts "Parsed: #{duration_str} -> #{duration.to_seconds} seconds"
rescue ISO8601::Errors::UnknownPattern => e
puts "Parse error for '#{duration_str}': #{e.message}"
rescue StandardError => e
puts "Unexpected error for '#{duration_str}': #{e.class} - #{e.message}"
end
end
Performance & Memory
XML schema validation performance depends on schema complexity and document size. Large schemas with extensive type hierarchies require more processing time and memory allocation.
require 'benchmark'
require 'xmlschema'
# Performance comparison: simple vs complex schemas
simple_schema = Xmlschema::Schema.from_uri('simple.xsd')
complex_schema = Xmlschema::Schema.from_uri('complex_with_imports.xsd')
test_xml = File.read('large_document.xml')
Benchmark.bm(20) do |x|
x.report('Simple schema:') do
1000.times { simple_schema.valid?(test_xml) }
end
x.report('Complex schema:') do
1000.times { complex_schema.valid?(test_xml) }
end
end
# Memory usage monitoring
require 'objspace'
before = ObjectSpace.count_objects
100.times { complex_schema.validate(test_xml) }
after = ObjectSpace.count_objects
puts "Objects created: #{after[:TOTAL] - before[:TOTAL]}"
puts "String objects: #{after[:T_STRING] - before[:T_STRING]}"
Schema caching reduces repeated parsing overhead when validating multiple documents against the same schema. The library maintains internal caches for parsed schemas.
# Efficient batch validation
schema = Xmlschema::Schema.from_uri('product_schema.xsd')
xml_files = Dir.glob('data/*.xml')
results = {}
# Process files in batches to manage memory
xml_files.each_slice(100) do |batch|
batch.each do |file|
xml_content = File.read(file)
begin
schema.validate(xml_content)
results[file] = :valid
rescue Xmlschema::SchemaError
results[file] = :invalid
end
end
# Force garbage collection between batches
GC.start
end
puts "Validation results: #{results.values.tally}"
ISO 8601 parsing performance varies significantly between different format complexities. Simple date parsing operates faster than full date-time with timezone parsing.
require 'benchmark'
require 'iso8601'
formats = {
'Date only' => '2024-03-15',
'DateTime' => '2024-03-15T10:30:00',
'DateTime with TZ' => '2024-03-15T10:30:00+02:00',
'Duration simple' => 'P1Y',
'Duration complex' => 'P1Y2M3DT4H5M6.123S'
}
Benchmark.bm(20) do |x|
formats.each do |label, format|
x.report(label) do
10000.times do
case format
when /^P/
ISO8601::Duration.new(format)
else
ISO8601::DateTime.new(format)
end
end
end
end
end
Large-scale date processing benefits from object reuse and batch operations. Creating parser objects once and reusing them reduces allocation overhead.
# Efficient bulk date processing
date_strings = Array.new(10000) { "2024-#{rand(1..12).to_s.rjust(2, '0')}-#{rand(1..28).to_s.rjust(2, '0')}T#{rand(0..23).to_s.rjust(2, '0')}:#{rand(0..59).to_s.rjust(2, '0')}:#{rand(0..59).to_s.rjust(2, '0')}Z" }
# Memory-efficient processing
parsed_dates = []
date_strings.each_slice(1000) do |batch|
batch_results = batch.map do |date_str|
ISO8601::DateTime.new(date_str).to_time
end
parsed_dates.concat(batch_results)
# Periodic memory cleanup
GC.start if parsed_dates.size % 5000 == 0
end
Production Patterns
Web applications commonly integrate XML schema validation for API request validation and data import processing. Rails applications can incorporate xmlschema validation into model validation chains.
# Rails model with XML schema validation
class ProductImport < ApplicationRecord
validate :validate_xml_structure
private
def validate_xml_structure
return unless xml_data.present?
schema = load_product_schema
begin
schema.validate(xml_data)
rescue Xmlschema::SchemaError => e
errors.add(:xml_data, "Invalid XML structure: #{e.message}")
end
end
def load_product_schema
@schema ||= Xmlschema::Schema.from_uri(
Rails.root.join('config', 'schemas', 'product.xsd')
)
end
end
# Background job for bulk XML processing
class XmlValidationJob < ApplicationJob
def perform(file_path, schema_name)
schema = load_schema(schema_name)
File.open(file_path) do |file|
xml_content = file.read
begin
schema.validate(xml_content)
update_validation_status(file_path, :valid)
rescue Xmlschema::SchemaError => e
log_validation_error(file_path, e)
update_validation_status(file_path, :invalid)
end
end
end
private
def load_schema(name)
Rails.cache.fetch("xml_schema_#{name}", expires_in: 1.hour) do
Xmlschema::Schema.from_uri("schemas/#{name}.xsd")
end
end
end
API controllers handle XML validation errors gracefully and provide meaningful error responses to clients.
class ApiController < ApplicationController
def create_product
xml_data = request.body.read
begin
product_schema.validate(xml_data)
product = parse_and_create_product(xml_data)
render json: { status: 'created', id: product.id }
rescue Xmlschema::SchemaError => e
render json: {
error: 'Invalid XML structure',
details: e.message,
location: extract_error_location(e)
}, status: :unprocessable_entity
end
end
private
def product_schema
@product_schema ||= Xmlschema::Schema.from_uri(
Rails.root.join('config', 'schemas', 'product_v2.xsd')
)
end
def extract_error_location(error)
return nil unless error.respond_to?(:path)
error.path.join(' > ')
end
end
ISO 8601 handling in production environments requires timezone-aware processing and consistent formatting across application layers.
# Service object for date/time processing
class DateTimeService
def self.parse_api_datetime(iso_string)
datetime = ISO8601::DateTime.new(iso_string)
# Convert to application timezone
time = datetime.to_time
time.in_time_zone(Rails.application.config.time_zone)
rescue ISO8601::Errors::UnknownPattern => e
Rails.logger.warn("Invalid datetime format: #{iso_string} - #{e.message}")
nil
end
def self.format_for_api(time)
time.utc.iso8601
end
def self.parse_duration(duration_string)
ISO8601::Duration.new(duration_string)
rescue ISO8601::Errors::UnknownPattern
nil
end
end
# API serializer with consistent ISO 8601 formatting
class EventSerializer < ActiveModel::Serializer
attributes :id, :name, :start_time, :end_time, :duration
def start_time
DateTimeService.format_for_api(object.start_time)
end
def end_time
DateTimeService.format_for_api(object.end_time)
end
def duration
return nil unless object.end_time && object.start_time
duration_seconds = object.end_time - object.start_time
"PT#{duration_seconds.to_i}S"
end
end
Monitoring and logging capture validation failures and performance metrics for production troubleshooting.
# Custom metrics collector
class ValidationMetrics
def self.record_schema_validation(schema_name, success, duration)
StatsD.timing("xml_validation.#{schema_name}.duration", duration)
StatsD.increment("xml_validation.#{schema_name}.#{success ? 'success' : 'failure'}")
end
def self.record_iso8601_parsing(format_type, success)
StatsD.increment("iso8601_parsing.#{format_type}.#{success ? 'success' : 'failure'}")
end
end
# Instrumented validation service
class ValidationService
def validate_xml(xml_data, schema_name)
start_time = Time.current
begin
schema = load_schema(schema_name)
schema.validate(xml_data)
ValidationMetrics.record_schema_validation(
schema_name, true, (Time.current - start_time) * 1000
)
{ valid: true }
rescue Xmlschema::SchemaError => e
ValidationMetrics.record_schema_validation(
schema_name, false, (Time.current - start_time) * 1000
)
Rails.logger.error("XML validation failed for schema #{schema_name}: #{e.message}")
{ valid: false, error: e.message }
end
end
end
Reference
Xmlschema Classes and Methods
Class | Description |
---|---|
Xmlschema::Schema |
Main schema validation class |
Xmlschema::SchemaError |
Base validation error class |
Xmlschema::SchemaParseError |
Schema parsing error class |
Xmlschema::NamespaceError |
Namespace validation error class |
Schema Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Schema.from_uri(uri) |
uri (String, URI) |
Schema |
Load schema from file or URI |
Schema.from_document(doc) |
doc (Nokogiri::Document) |
Schema |
Create schema from parsed document |
#valid?(xml) |
xml (String, Document) |
Boolean |
Check if XML is valid |
#validate(xml) |
xml (String, Document) |
nil or raises |
Validate XML, raise on error |
#namespace_uri |
None | String |
Get schema target namespace |
Schema Error Attributes
Attribute | Type | Description |
---|---|---|
#message |
String |
Human-readable error description |
#path |
Array<String> |
Element path where error occurred |
#expected_type |
String |
Expected schema type |
#actual_value |
String |
Actual value that failed validation |
#line_number |
Integer |
Line number in XML document |
#column |
Integer |
Column number in XML document |
ISO8601 Classes and Methods
Class | Description |
---|---|
ISO8601::DateTime |
Date-time parsing and manipulation |
ISO8601::Date |
Date-only parsing |
ISO8601::Time |
Time-only parsing |
ISO8601::Duration |
Duration parsing and calculation |
ISO8601::TimeInterval |
Time interval representation |
DateTime Methods
Method | Parameters | Returns | Description |
---|---|---|---|
DateTime.new(string) |
string (String) |
DateTime |
Parse ISO 8601 date-time string |
#to_time |
None | Time |
Convert to Ruby Time object |
#to_date |
None | Date |
Convert to Ruby Date object |
#year |
None | Integer |
Extract year component |
#month |
None | Integer |
Extract month component (1-12) |
#day |
None | Integer |
Extract day component |
#hour |
None | Integer |
Extract hour component (0-23) |
#minute |
None | Integer |
Extract minute component (0-59) |
#second |
None | Float |
Extract second component with decimals |
#zone |
None | String |
Extract timezone offset |
Duration Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Duration.new(string) |
string (String) |
Duration |
Parse ISO 8601 duration string |
#to_seconds |
None | Float |
Convert to total seconds |
#years |
None | Integer |
Extract years component |
#months |
None | Integer |
Extract months component |
#days |
None | Integer |
Extract days component |
#hours |
None | Integer |
Extract hours component |
#minutes |
None | Integer |
Extract minutes component |
#seconds |
None | Float |
Extract seconds component |
Error Classes
Error Class | Description |
---|---|
ISO8601::Errors::UnknownPattern |
Invalid format string |
ISO8601::Errors::RangeError |
Value outside valid range |
ISO8601::Errors::TypeError |
Incorrect argument type |
Common ISO 8601 Format Patterns
Pattern | Example | Description |
---|---|---|
YYYY-MM-DD |
2024-03-15 |
Calendar date |
YYYY-MM-DDTHH:MM:SS |
2024-03-15T14:30:00 |
Date and time |
YYYY-MM-DDTHH:MM:SSZ |
2024-03-15T14:30:00Z |
UTC date and time |
YYYY-MM-DDTHH:MM:SS±HH:MM |
2024-03-15T14:30:00+02:00 |
Date and time with timezone |
PnYnMnDTnHnMnS |
P1Y2M3DT4H5M6S |
Duration |
PT30M |
PT30M |
30-minute duration |
P7D |
P7D |
7-day duration |