CrackedRuby logo

CrackedRuby

xmlschema and iso8601

Documentation for XML Schema validation and ISO 8601 date/time handling in Ruby applications.

Core Built-in Classes Time and Date Classes
2.8.8

Overview

Ruby provides two essential libraries for structured data validation and standardized date-time operations: xmlschema for XML Schema Definition (XSD) validation and iso8601 for parsing and formatting ISO 8601 date-time strings.

The xmlschema library validates XML documents against XSD schemas, ensuring data conformity and structural integrity. Ruby implements schema validation through the Xmlschema::Schema class, which parses XSD files and validates XML documents. The library supports complex schema features including type restrictions, element constraints, and namespace handling.

The iso8601 library handles ISO 8601 date-time string parsing and formatting operations. Ruby processes ISO 8601 strings through the ISO8601 module, which converts standardized date-time representations into Ruby Time and Date objects. The library manages time zones, durations, intervals, and various ISO 8601 formats.

require 'xmlschema'
require 'iso8601'

# XML schema validation
schema = Xmlschema::Schema.from_uri('schema.xsd')
schema.valid?('<root><element>value</element></root>')

# ISO 8601 parsing
datetime = ISO8601::DateTime.new('2024-01-15T14:30:00Z')
datetime.to_time
# => 2024-01-15 14:30:00 UTC

Both libraries handle data validation and transformation tasks that applications encounter when processing external data sources, API responses, and configuration files.

Basic Usage

XML schema validation begins with loading a schema definition from a file or URI. The Xmlschema::Schema.from_uri method parses XSD files and creates validation objects.

require 'xmlschema'

# Load schema from file
schema = Xmlschema::Schema.from_uri('products.xsd')

# Validate XML string
xml_data = <<~XML
  <product>
    <id>123</id>
    <name>Widget</name>
    <price>29.99</price>
  </product>
XML

result = schema.valid?(xml_data)
# => true or false

Schema validation processes XML documents and returns boolean results. The valid? method performs complete schema validation including element structure, data types, and constraints.

# Detailed validation with error information
begin
  schema.validate(xml_data)
  puts "Valid XML document"
rescue Xmlschema::SchemaError => e
  puts "Validation failed: #{e.message}"
  puts "Error location: #{e.location}"
end

ISO 8601 parsing handles various standardized date-time formats. The ISO8601::DateTime.new constructor accepts ISO 8601 strings and creates Ruby objects for manipulation.

require 'iso8601'

# Parse basic date-time
dt = ISO8601::DateTime.new('2024-03-15T10:30:00')
dt.year    # => 2024
dt.month   # => 3
dt.hour    # => 10

# Parse with timezone
dt_tz = ISO8601::DateTime.new('2024-03-15T10:30:00+02:00')
dt_tz.zone # => "+02:00"
dt_tz.to_time.utc
# => 2024-03-15 08:30:00 UTC

Duration parsing extracts time periods from ISO 8601 duration strings. The ISO8601::Duration class handles period calculations and conversions.

# Parse duration
duration = ISO8601::Duration.new('P1Y2M3DT4H5M6S')
duration.years   # => 1
duration.months  # => 2
duration.days    # => 3
duration.hours   # => 4
duration.minutes # => 5
duration.seconds # => 6

# Convert to seconds
duration.to_seconds # => 36993906 (approximate)

Date-only and time-only parsing handles partial ISO 8601 representations. The library converts these to appropriate Ruby objects.

# Date-only parsing
date = ISO8601::Date.new('2024-03-15')
date.to_date # => #<Date: 2024-03-15>

# Time-only parsing  
time = ISO8601::Time.new('14:30:00')
time.hour   # => 14
time.minute # => 30

Error Handling & Debugging

XML schema validation generates specific exceptions for different validation failure types. The Xmlschema::SchemaError hierarchy provides detailed error information including element paths and constraint violations.

require 'xmlschema'

schema = Xmlschema::Schema.from_uri('strict_schema.xsd')

invalid_xml = <<~XML
  <product>
    <id>not_a_number</id>
    <missing_required_field/>
  </product>
XML

begin
  schema.validate(invalid_xml)
rescue Xmlschema::SchemaError => e
  # Detailed error analysis
  puts "Error type: #{e.class}"
  puts "Message: #{e.message}"
  puts "Element path: #{e.path}" if e.respond_to?(:path)
  puts "Expected type: #{e.expected_type}" if e.respond_to?(:expected_type)
  puts "Actual value: #{e.actual_value}" if e.respond_to?(:actual_value)
end

Schema loading errors occur when XSD files contain syntax errors or reference unavailable resources. The library raises Xmlschema::SchemaParseError for schema definition problems.

begin
  schema = Xmlschema::Schema.from_uri('malformed_schema.xsd')
rescue Xmlschema::SchemaParseError => e
  puts "Schema parsing failed: #{e.message}"
  puts "Line number: #{e.line_number}" if e.respond_to?(:line_number)
  puts "Column: #{e.column}" if e.respond_to?(:column)
rescue Errno::ENOENT
  puts "Schema file not found"
rescue URI::InvalidURIError
  puts "Invalid schema URI"
end

Namespace validation failures require careful error handling since XML documents may contain multiple namespace declarations. Schema validation tracks namespace context during validation.

xml_with_namespaces = <<~XML
  <root xmlns="http://example.com/ns1" xmlns:ns2="http://example.com/ns2">
    <element>value</element>
    <ns2:other_element>other_value</ns2:other_element>
  </root>
XML

begin
  schema.validate(xml_with_namespaces)
rescue Xmlschema::NamespaceError => e
  puts "Namespace validation failed"
  puts "Expected namespace: #{e.expected_namespace}"
  puts "Actual namespace: #{e.actual_namespace}"
  puts "Element name: #{e.element_name}"
end

ISO 8601 parsing errors occur when strings deviate from standard formats. The ISO8601::Errors::UnknownPattern exception indicates format recognition failures.

require 'iso8601'

invalid_formats = [
  '2024-13-45',      # Invalid month/day
  '25:99:99',        # Invalid time
  'P1Y2M3DT25H',     # Invalid hour in duration
  '2024-02-30',      # Invalid date (Feb 30)
  'T10:30:00'        # Missing date component
]

invalid_formats.each do |format|
  begin
    ISO8601::DateTime.new(format)
  rescue ISO8601::Errors::UnknownPattern => e
    puts "Invalid format '#{format}': #{e.message}"
  rescue ISO8601::Errors::RangeError => e
    puts "Value out of range '#{format}': #{e.message}"
  end
end

Duration parsing handles edge cases where duration components exceed normal ranges or contain conflicting information.

problematic_durations = [
  'P400D',           # Large day count
  'PT25H',           # Hours exceeding 24
  'P1Y13M',          # Months exceeding 12
  'P-1Y'             # Negative periods
]

problematic_durations.each do |duration_str|
  begin
    duration = ISO8601::Duration.new(duration_str)
    puts "Parsed: #{duration_str} -> #{duration.to_seconds} seconds"
  rescue ISO8601::Errors::UnknownPattern => e
    puts "Parse error for '#{duration_str}': #{e.message}"
  rescue StandardError => e
    puts "Unexpected error for '#{duration_str}': #{e.class} - #{e.message}"
  end
end

Performance & Memory

XML schema validation performance depends on schema complexity and document size. Large schemas with extensive type hierarchies require more processing time and memory allocation.

require 'benchmark'
require 'xmlschema'

# Performance comparison: simple vs complex schemas
simple_schema = Xmlschema::Schema.from_uri('simple.xsd')
complex_schema = Xmlschema::Schema.from_uri('complex_with_imports.xsd')

test_xml = File.read('large_document.xml')

Benchmark.bm(20) do |x|
  x.report('Simple schema:') do
    1000.times { simple_schema.valid?(test_xml) }
  end
  
  x.report('Complex schema:') do
    1000.times { complex_schema.valid?(test_xml) }
  end
end

# Memory usage monitoring
require 'objspace'

before = ObjectSpace.count_objects
100.times { complex_schema.validate(test_xml) }
after = ObjectSpace.count_objects

puts "Objects created: #{after[:TOTAL] - before[:TOTAL]}"
puts "String objects: #{after[:T_STRING] - before[:T_STRING]}"

Schema caching reduces repeated parsing overhead when validating multiple documents against the same schema. The library maintains internal caches for parsed schemas.

# Efficient batch validation
schema = Xmlschema::Schema.from_uri('product_schema.xsd')

xml_files = Dir.glob('data/*.xml')
results = {}

# Process files in batches to manage memory
xml_files.each_slice(100) do |batch|
  batch.each do |file|
    xml_content = File.read(file)
    begin
      schema.validate(xml_content)
      results[file] = :valid
    rescue Xmlschema::SchemaError
      results[file] = :invalid
    end
  end
  
  # Force garbage collection between batches
  GC.start
end

puts "Validation results: #{results.values.tally}"

ISO 8601 parsing performance varies significantly between different format complexities. Simple date parsing operates faster than full date-time with timezone parsing.

require 'benchmark'
require 'iso8601'

formats = {
  'Date only' => '2024-03-15',
  'DateTime' => '2024-03-15T10:30:00',
  'DateTime with TZ' => '2024-03-15T10:30:00+02:00',
  'Duration simple' => 'P1Y',
  'Duration complex' => 'P1Y2M3DT4H5M6.123S'
}

Benchmark.bm(20) do |x|
  formats.each do |label, format|
    x.report(label) do
      10000.times do
        case format
        when /^P/
          ISO8601::Duration.new(format)
        else
          ISO8601::DateTime.new(format)
        end
      end
    end
  end
end

Large-scale date processing benefits from object reuse and batch operations. Creating parser objects once and reusing them reduces allocation overhead.

# Efficient bulk date processing
date_strings = Array.new(10000) { "2024-#{rand(1..12).to_s.rjust(2, '0')}-#{rand(1..28).to_s.rjust(2, '0')}T#{rand(0..23).to_s.rjust(2, '0')}:#{rand(0..59).to_s.rjust(2, '0')}:#{rand(0..59).to_s.rjust(2, '0')}Z" }

# Memory-efficient processing
parsed_dates = []
date_strings.each_slice(1000) do |batch|
  batch_results = batch.map do |date_str|
    ISO8601::DateTime.new(date_str).to_time
  end
  parsed_dates.concat(batch_results)
  
  # Periodic memory cleanup
  GC.start if parsed_dates.size % 5000 == 0
end

Production Patterns

Web applications commonly integrate XML schema validation for API request validation and data import processing. Rails applications can incorporate xmlschema validation into model validation chains.

# Rails model with XML schema validation
class ProductImport < ApplicationRecord
  validate :validate_xml_structure
  
  private
  
  def validate_xml_structure
    return unless xml_data.present?
    
    schema = load_product_schema
    begin
      schema.validate(xml_data)
    rescue Xmlschema::SchemaError => e
      errors.add(:xml_data, "Invalid XML structure: #{e.message}")
    end
  end
  
  def load_product_schema
    @schema ||= Xmlschema::Schema.from_uri(
      Rails.root.join('config', 'schemas', 'product.xsd')
    )
  end
end

# Background job for bulk XML processing
class XmlValidationJob < ApplicationJob
  def perform(file_path, schema_name)
    schema = load_schema(schema_name)
    
    File.open(file_path) do |file|
      xml_content = file.read
      
      begin
        schema.validate(xml_content)
        update_validation_status(file_path, :valid)
      rescue Xmlschema::SchemaError => e
        log_validation_error(file_path, e)
        update_validation_status(file_path, :invalid)
      end
    end
  end
  
  private
  
  def load_schema(name)
    Rails.cache.fetch("xml_schema_#{name}", expires_in: 1.hour) do
      Xmlschema::Schema.from_uri("schemas/#{name}.xsd")
    end
  end
end

API controllers handle XML validation errors gracefully and provide meaningful error responses to clients.

class ApiController < ApplicationController
  def create_product
    xml_data = request.body.read
    
    begin
      product_schema.validate(xml_data)
      product = parse_and_create_product(xml_data)
      render json: { status: 'created', id: product.id }
    rescue Xmlschema::SchemaError => e
      render json: {
        error: 'Invalid XML structure',
        details: e.message,
        location: extract_error_location(e)
      }, status: :unprocessable_entity
    end
  end
  
  private
  
  def product_schema
    @product_schema ||= Xmlschema::Schema.from_uri(
      Rails.root.join('config', 'schemas', 'product_v2.xsd')
    )
  end
  
  def extract_error_location(error)
    return nil unless error.respond_to?(:path)
    error.path.join(' > ')
  end
end

ISO 8601 handling in production environments requires timezone-aware processing and consistent formatting across application layers.

# Service object for date/time processing
class DateTimeService
  def self.parse_api_datetime(iso_string)
    datetime = ISO8601::DateTime.new(iso_string)
    
    # Convert to application timezone
    time = datetime.to_time
    time.in_time_zone(Rails.application.config.time_zone)
  rescue ISO8601::Errors::UnknownPattern => e
    Rails.logger.warn("Invalid datetime format: #{iso_string} - #{e.message}")
    nil
  end
  
  def self.format_for_api(time)
    time.utc.iso8601
  end
  
  def self.parse_duration(duration_string)
    ISO8601::Duration.new(duration_string)
  rescue ISO8601::Errors::UnknownPattern
    nil
  end
end

# API serializer with consistent ISO 8601 formatting
class EventSerializer < ActiveModel::Serializer
  attributes :id, :name, :start_time, :end_time, :duration
  
  def start_time
    DateTimeService.format_for_api(object.start_time)
  end
  
  def end_time  
    DateTimeService.format_for_api(object.end_time)
  end
  
  def duration
    return nil unless object.end_time && object.start_time
    
    duration_seconds = object.end_time - object.start_time
    "PT#{duration_seconds.to_i}S"
  end
end

Monitoring and logging capture validation failures and performance metrics for production troubleshooting.

# Custom metrics collector
class ValidationMetrics
  def self.record_schema_validation(schema_name, success, duration)
    StatsD.timing("xml_validation.#{schema_name}.duration", duration)
    StatsD.increment("xml_validation.#{schema_name}.#{success ? 'success' : 'failure'}")
  end
  
  def self.record_iso8601_parsing(format_type, success)
    StatsD.increment("iso8601_parsing.#{format_type}.#{success ? 'success' : 'failure'}")
  end
end

# Instrumented validation service
class ValidationService
  def validate_xml(xml_data, schema_name)
    start_time = Time.current
    
    begin
      schema = load_schema(schema_name)
      schema.validate(xml_data)
      
      ValidationMetrics.record_schema_validation(
        schema_name, true, (Time.current - start_time) * 1000
      )
      
      { valid: true }
    rescue Xmlschema::SchemaError => e
      ValidationMetrics.record_schema_validation(
        schema_name, false, (Time.current - start_time) * 1000
      )
      
      Rails.logger.error("XML validation failed for schema #{schema_name}: #{e.message}")
      { valid: false, error: e.message }
    end
  end
end

Reference

Xmlschema Classes and Methods

Class Description
Xmlschema::Schema Main schema validation class
Xmlschema::SchemaError Base validation error class
Xmlschema::SchemaParseError Schema parsing error class
Xmlschema::NamespaceError Namespace validation error class

Schema Methods

Method Parameters Returns Description
Schema.from_uri(uri) uri (String, URI) Schema Load schema from file or URI
Schema.from_document(doc) doc (Nokogiri::Document) Schema Create schema from parsed document
#valid?(xml) xml (String, Document) Boolean Check if XML is valid
#validate(xml) xml (String, Document) nil or raises Validate XML, raise on error
#namespace_uri None String Get schema target namespace

Schema Error Attributes

Attribute Type Description
#message String Human-readable error description
#path Array<String> Element path where error occurred
#expected_type String Expected schema type
#actual_value String Actual value that failed validation
#line_number Integer Line number in XML document
#column Integer Column number in XML document

ISO8601 Classes and Methods

Class Description
ISO8601::DateTime Date-time parsing and manipulation
ISO8601::Date Date-only parsing
ISO8601::Time Time-only parsing
ISO8601::Duration Duration parsing and calculation
ISO8601::TimeInterval Time interval representation

DateTime Methods

Method Parameters Returns Description
DateTime.new(string) string (String) DateTime Parse ISO 8601 date-time string
#to_time None Time Convert to Ruby Time object
#to_date None Date Convert to Ruby Date object
#year None Integer Extract year component
#month None Integer Extract month component (1-12)
#day None Integer Extract day component
#hour None Integer Extract hour component (0-23)
#minute None Integer Extract minute component (0-59)
#second None Float Extract second component with decimals
#zone None String Extract timezone offset

Duration Methods

Method Parameters Returns Description
Duration.new(string) string (String) Duration Parse ISO 8601 duration string
#to_seconds None Float Convert to total seconds
#years None Integer Extract years component
#months None Integer Extract months component
#days None Integer Extract days component
#hours None Integer Extract hours component
#minutes None Integer Extract minutes component
#seconds None Float Extract seconds component

Error Classes

Error Class Description
ISO8601::Errors::UnknownPattern Invalid format string
ISO8601::Errors::RangeError Value outside valid range
ISO8601::Errors::TypeError Incorrect argument type

Common ISO 8601 Format Patterns

Pattern Example Description
YYYY-MM-DD 2024-03-15 Calendar date
YYYY-MM-DDTHH:MM:SS 2024-03-15T14:30:00 Date and time
YYYY-MM-DDTHH:MM:SSZ 2024-03-15T14:30:00Z UTC date and time
YYYY-MM-DDTHH:MM:SS±HH:MM 2024-03-15T14:30:00+02:00 Date and time with timezone
PnYnMnDTnHnMnS P1Y2M3DT4H5M6S Duration
PT30M PT30M 30-minute duration
P7D P7D 7-day duration