CrackedRuby CrackedRuby

Service-Oriented Architecture (SOA)

Overview

Service-Oriented Architecture (SOA) structures applications as collections of loosely coupled services that communicate over a network. Each service encapsulates a specific business capability and exposes that functionality through a well-defined interface, independent of the underlying implementation technology.

SOA emerged in the late 1990s as organizations struggled with monolithic applications that became increasingly difficult to maintain and scale. The architecture promotes reusability by allowing different applications to consume the same services, reducing duplication across an organization. A customer validation service, for example, can serve web applications, mobile apps, and batch processing systems without modification.

The architecture operates on the principle that services remain independent units of functionality. When an e-commerce application needs to process a payment, it calls a payment service through a defined contract. The calling application does not need to know whether the payment service uses a SQL database, connects to external payment processors, or runs on a specific server. This separation enables teams to develop, deploy, and scale services independently.

SOA differs from microservices in scope and granularity. SOA services typically represent larger business domains with multiple operations exposed through a single service interface. A customer service in SOA might include operations for creating customers, updating addresses, validating credit, and retrieving purchase history. Microservices generally focus on smaller, single-purpose capabilities.

# SOA service example - Customer service with multiple operations
class CustomerService
  def create_customer(data)
    # Handle customer creation
    customer = Customer.create!(data)
    publish_event('customer.created', customer.to_h)
    customer
  end
  
  def update_address(customer_id, address_data)
    # Address update logic
    customer = Customer.find(customer_id)
    customer.update_address!(address_data)
    customer
  end
  
  def validate_credit(customer_id)
    # Credit validation logic
    CreditValidator.check(customer_id)
  end
end

The architecture introduces complexity through network communication, service discovery, and distributed data management. Applications must handle network failures, implement service versioning, and coordinate transactions across multiple services. Organizations adopt SOA when the benefits of service reusability, independent scalability, and technology flexibility outweigh these operational challenges.

Key Principles

Service Contract: Each service defines a formal contract specifying operations, input parameters, output formats, and error conditions. The contract serves as an agreement between service providers and consumers. Changes to contracts require versioning strategies to prevent breaking existing consumers. Contracts typically use standards like WSDL for SOAP services or OpenAPI specifications for REST APIs.

Loose Coupling: Services minimize dependencies on other services' internal implementations. A service consumer should only depend on the service contract, not on implementation details like database schemas or internal class structures. This principle allows teams to refactor service internals without affecting consumers. Message-based communication patterns promote loose coupling by eliminating direct service-to-service calls.

Service Abstraction: Services hide implementation details from consumers. A payment processing service exposes operations like process_payment and refund_payment without revealing whether it uses Stripe, PayPal, or an internal payment system. This abstraction enables service providers to swap implementations or optimize internal logic without consumer impact.

Service Reusability: Services provide functionality that multiple applications can consume. An address validation service can serve web applications, mobile apps, and batch processing systems. Designing for reusability requires careful consideration of service granularity and interface design to meet diverse consumer needs without creating overly generic, difficult-to-use APIs.

Service Autonomy: Services control their own behavior and data. Each service owns its database schema and business logic without external systems directly manipulating its data. A customer service manages the customer database exclusively. Other services interact with customer data only through the customer service API, maintaining data integrity and enabling independent service evolution.

Service Statelessness: Services avoid maintaining client-specific state between requests. Each request contains all information needed to process it. Stateless services scale horizontally more easily since any service instance can handle any request. Session state, when required, moves to external stores like Redis or client-side tokens rather than service memory.

Service Discoverability: Services register with a service registry, enabling clients to locate services dynamically at runtime. Service registries like Consul, Eureka, or Zookeeper maintain service locations, health status, and metadata. Clients query the registry to find available service instances rather than hardcoding service addresses.

# Service registry interaction example
class ServiceRegistry
  def register(service_name, host, port, metadata = {})
    @registry ||= {}
    @registry[service_name] ||= []
    @registry[service_name] << {
      host: host,
      port: port,
      metadata: metadata,
      registered_at: Time.now
    }
  end
  
  def discover(service_name)
    instances = @registry[service_name] || []
    instances.select { |i| healthy?(i) }
  end
  
  def healthy?(instance)
    # Health check logic
    TCPSocket.new(instance[:host], instance[:port]).close
    true
  rescue
    false
  end
end

Service Composability: Services combine to form higher-level business processes. An order processing workflow might invoke inventory, payment, and shipping services in sequence. Service orchestration manages these combinations, coordinating calls and handling failures. Choreography patterns distribute coordination logic across services through event-driven communication.

Design Considerations

Granularity Decisions: Service granularity significantly impacts architecture complexity and performance. Coarse-grained services encapsulate larger business domains, reducing network overhead and simplifying service management but potentially limiting reusability and independent scaling. Fine-grained services enable precise scaling and reuse but increase operational complexity and inter-service communication costs.

An e-commerce platform choosing between a single order service or separate services for cart management, checkout, and fulfillment exemplifies this trade-off. A unified order service simplifies deployment and reduces network calls but prevents independent scaling of high-volume cart operations from lower-volume fulfillment processes. The decision depends on traffic patterns, team structure, and operational capabilities.

Communication Protocol Selection: SOAP provides strong typing, built-in error handling, and comprehensive WS-* standards for security and transactions but introduces XML parsing overhead and complex tooling requirements. REST offers simplicity and widespread adoption with HTTP semantics but lacks standardized approaches for complex operations, transactions, and event notification.

Message queues like RabbitMQ or Apache Kafka enable asynchronous communication, improving resilience and enabling event-driven architectures. Queue-based communication decouples services temporally—consumers need not be available when producers send messages. This pattern suits workflows tolerating eventual consistency but complicates request-response scenarios requiring immediate results.

# Protocol comparison - REST endpoint
class OrdersController < ApplicationController
  def create
    order = OrderService.create_order(params[:order])
    render json: order, status: :created
  rescue OrderService::ValidationError => e
    render json: { error: e.message }, status: :unprocessable_entity
  end
end

# Message queue approach
class OrderProcessor
  def process(message)
    order_data = JSON.parse(message.payload)
    order = OrderService.create_order(order_data)
    
    # Publish event for downstream services
    queue.publish('order.created', order.to_json)
  rescue => e
    # Handle async errors differently
    ErrorNotifier.report(e)
    message.nack # Return to queue for retry
  end
end

Data Management Strategies: Shared database approaches violate service autonomy but simplify queries spanning multiple services. Database-per-service patterns enforce service boundaries and enable technology diversity but complicate cross-service queries and transactions. Implementing reports requiring customer, order, and inventory data becomes challenging when each service owns its database.

The saga pattern addresses distributed transactions by breaking them into local transactions with compensating actions for rollback. An order placement saga might reserve inventory, charge payment, and create shipment records as separate local transactions. If payment fails, the saga executes compensating transactions to release inventory and cancel the shipment. This approach trades ACID properties for availability and partition tolerance.

Service Versioning Approaches: URI versioning (/api/v1/orders, /api/v2/orders) clearly segregates versions but proliferates endpoints and complicates routing. Header-based versioning (Accept: application/vnd.company.v2+json) keeps URIs stable but makes versions less discoverable. Content negotiation through media types enables gradual migration but requires consumer sophistication.

Breaking changes require maintaining multiple service versions simultaneously. A payment service changing its charge API from accepting card details to requiring payment tokens must support both approaches during a transition period. The duration depends on how quickly consumers can migrate—internal consumers might migrate within weeks while external API consumers might require months of notice.

Failure Mode Planning: Services must handle partial failures gracefully. When a recommendation service fails, an e-commerce site should display products without recommendations rather than showing error pages. Circuit breakers prevent cascading failures by stopping calls to failing services after a threshold of failures. Bulkheads isolate failures by limiting resources allocated to each service dependency.

# Circuit breaker pattern
class CircuitBreaker
  FAILURE_THRESHOLD = 5
  TIMEOUT_DURATION = 60
  
  def initialize(service_name)
    @service_name = service_name
    @failures = 0
    @last_failure_time = nil
    @state = :closed
  end
  
  def call(&block)
    case @state
    when :open
      raise CircuitOpenError if Time.now - @last_failure_time < TIMEOUT_DURATION
      @state = :half_open
      attempt_call(&block)
    when :half_open
      attempt_call(&block)
    else
      attempt_call(&block)
    end
  end
  
  private
  
  def attempt_call(&block)
    result = block.call
    reset_failures
    result
  rescue => e
    record_failure
    raise
  end
  
  def record_failure
    @failures += 1
    @last_failure_time = Time.now
    @state = :open if @failures >= FAILURE_THRESHOLD
  end
  
  def reset_failures
    @failures = 0
    @state = :closed
  end
end

Implementation Approaches

SOAP-Based SOA: SOAP services use XML for message formatting and WSDL for service contracts. The approach provides strong typing through XML Schema, standardized error handling with SOAP faults, and extensive WS-* specifications covering security (WS-Security), transactions (WS-AtomicTransaction), and reliable messaging (WS-ReliableMessaging). SOAP excels in enterprise environments requiring formal contracts and complex transaction coordination but introduces significant overhead for simple request-response scenarios.

The WSDL-first approach defines service contracts before implementation, generating code from WSDL specifications. This strategy ensures contract stability and enables contract-based testing but requires WSDL expertise and can produce awkward APIs if the contract poorly maps to the implementation language. Code-first approaches generate WSDL from implementation but risk producing contracts tightly coupled to implementation details.

REST-Based SOA: REST services expose resources through HTTP operations (GET, POST, PUT, DELETE) with JSON or XML representations. REST leverages HTTP infrastructure—caching, authentication, content negotiation—reducing the need for custom protocols. The approach suits public APIs and simple service interactions but lacks standardization for complex scenarios like bulk operations, transactions, and event subscription.

Richardson Maturity Model levels indicate REST sophistication. Level 0 uses HTTP as a transport tunnel. Level 1 introduces resource URIs. Level 2 adds HTTP verbs and status codes. Level 3 implements hypermedia controls (HATEOAS), embedding navigation links in responses to guide clients through available operations. Most REST APIs operate at Level 2.

# REST service implementation with Sinatra
require 'sinatra/base'
require 'json'

class CustomerService < Sinatra::Base
  before do
    content_type :json
  end
  
  get '/customers/:id' do
    customer = Customer.find(params[:id])
    customer.to_json
  rescue ActiveRecord::RecordNotFound
    status 404
    { error: 'Customer not found' }.to_json
  end
  
  post '/customers' do
    data = JSON.parse(request.body.read)
    customer = Customer.create!(data)
    status 201
    customer.to_json
  rescue ActiveRecord::RecordInvalid => e
    status 422
    { error: e.message }.to_json
  end
  
  put '/customers/:id' do
    customer = Customer.find(params[:id])
    data = JSON.parse(request.body.read)
    customer.update!(data)
    customer.to_json
  end
  
  delete '/customers/:id' do
    Customer.find(params[:id]).destroy
    status 204
  end
end

Event-Driven SOA: Services communicate through events rather than direct calls. A service publishes events when significant state changes occur; interested services subscribe to relevant events. This approach decouples services temporally and reduces dependencies but introduces eventual consistency and complicates understanding system behavior through distributed event flows.

Event sourcing stores events as the primary data model rather than current state. An order service stores order placement, payment, and shipment events rather than final order state. Current state derives from replaying events. This pattern provides complete audit trails and enables temporal queries but increases storage requirements and complicates simple queries.

Enterprise Service Bus (ESB): An ESB centralizes integration logic, routing messages between services, transforming data formats, and managing service orchestration. The bus handles protocol translation, enabling SOAP services to communicate with REST endpoints or message queues. ESBs simplify adding new services by connecting them to the bus rather than point-to-point integrations but create a single point of failure and potential performance bottleneck.

Smart endpoints with dumb pipes invert this model. Services contain business logic and integration requirements while the bus provides only message routing. This approach reduces bus complexity and coupling but distributes integration logic across services, potentially causing duplication.

Ruby Implementation

Service Implementation with Grape: Grape provides a DSL for building REST APIs with automatic parameter validation, versioning support, and content negotiation. The framework suits service implementation when requirements include parameter coercion, entity presentation, and API versioning.

require 'grape'

class OrderAPI < Grape::API
  version 'v1', using: :header, vendor: 'company'
  format :json
  prefix :api
  
  resource :orders do
    desc 'Create a new order'
    params do
      requires :customer_id, type: Integer
      requires :items, type: Array do
        requires :product_id, type: Integer
        requires :quantity, type: Integer
      end
      optional :shipping_address, type: Hash do
        requires :street, type: String
        requires :city, type: String
        requires :postal_code, type: String
      end
    end
    post do
      order = OrderService.create_order(
        customer_id: params[:customer_id],
        items: params[:items],
        shipping_address: params[:shipping_address]
      )
      present order, with: OrderEntity
    end
    
    desc 'Retrieve order by ID'
    params do
      requires :id, type: Integer
    end
    route_param :id do
      get do
        order = Order.find(params[:id])
        present order, with: OrderEntity
      end
    end
  end
  
  rescue_from ActiveRecord::RecordNotFound do
    error!({ error: 'Resource not found' }, 404)
  end
  
  rescue_from Grape::Exceptions::ValidationErrors do |e|
    error!({ errors: e.full_messages }, 422)
  end
end

class OrderEntity < Grape::Entity
  expose :id
  expose :customer_id
  expose :total_amount
  expose :status
  expose :created_at
  expose :items, using: OrderItemEntity
end

Service Client Implementation: HTTP client libraries like HTTParty or Faraday handle service communication. Implementing retry logic, timeout management, and circuit breakers requires additional patterns around basic HTTP calls.

require 'faraday'
require 'faraday_middleware'

class PaymentServiceClient
  BASE_URL = ENV['PAYMENT_SERVICE_URL']
  
  def initialize
    @conn = Faraday.new(url: BASE_URL) do |f|
      f.request :json
      f.response :json, content_type: /\bjson$/
      f.adapter Faraday.default_adapter
      f.options.timeout = 5
      f.options.open_timeout = 2
    end
  end
  
  def process_payment(payment_data)
    response = @conn.post('/payments') do |req|
      req.body = payment_data
      req.headers['X-API-Key'] = ENV['PAYMENT_API_KEY']
    end
    
    case response.status
    when 200..299
      response.body
    when 422
      raise ValidationError, response.body['error']
    when 500..599
      raise ServiceError, "Payment service error: #{response.status}"
    else
      raise UnexpectedError, "Unexpected status: #{response.status}"
    end
  rescue Faraday::TimeoutError
    raise ServiceTimeout, "Payment service timeout"
  rescue Faraday::ConnectionFailed => e
    raise ServiceUnavailable, "Payment service unavailable: #{e.message}"
  end
  
  class ValidationError < StandardError; end
  class ServiceError < StandardError; end
  class ServiceTimeout < StandardError; end
  class ServiceUnavailable < StandardError; end
  class UnexpectedError < StandardError; end
end

Message Queue Integration with Bunny: Bunny provides a Ruby client for RabbitMQ, enabling asynchronous service communication through message queues.

require 'bunny'
require 'json'

class OrderEventPublisher
  def initialize
    @connection = Bunny.new(ENV['RABBITMQ_URL'])
    @connection.start
    @channel = @connection.create_channel
    @exchange = @channel.topic('orders', durable: true)
  end
  
  def publish_order_created(order)
    message = {
      event_type: 'order.created',
      order_id: order.id,
      customer_id: order.customer_id,
      total_amount: order.total_amount,
      timestamp: Time.now.iso8601
    }
    
    @exchange.publish(
      message.to_json,
      routing_key: 'order.created',
      persistent: true,
      content_type: 'application/json'
    )
  end
  
  def close
    @channel.close
    @connection.close
  end
end

class InventoryService
  def start_consuming
    connection = Bunny.new(ENV['RABBITMQ_URL'])
    connection.start
    channel = connection.create_channel
    exchange = channel.topic('orders', durable: true)
    queue = channel.queue('inventory.orders', durable: true)
    queue.bind(exchange, routing_key: 'order.created')
    
    queue.subscribe(manual_ack: true, block: true) do |delivery_info, properties, body|
      process_order_event(JSON.parse(body))
      channel.ack(delivery_info.delivery_tag)
    rescue => e
      logger.error("Failed to process order event: #{e.message}")
      channel.nack(delivery_info.delivery_tag, false, true)
    end
  end
  
  def process_order_event(event)
    order_id = event['order_id']
    # Reserve inventory for order
    InventoryReservation.create!(order_id: order_id)
  end
end

Service Discovery with Consul: Consul provides service registration and discovery. Ruby services register themselves on startup and query Consul to locate other services.

require 'diplomat'

class ServiceDiscovery
  def register_service(name, host, port, tags = [])
    Diplomat::Service.register(
      Name: name,
      Address: host,
      Port: port,
      Tags: tags,
      Check: {
        HTTP: "http://#{host}:#{port}/health",
        Interval: "10s"
      }
    )
  end
  
  def discover_service(name)
    services = Diplomat::Service.get(name, :passing)
    services.map do |service|
      {
        host: service.ServiceAddress,
        port: service.ServicePort,
        tags: service.ServiceTags
      }
    end
  rescue Diplomat::KeyNotFound
    []
  end
  
  def deregister_service(service_id)
    Diplomat::Service.deregister(service_id)
  end
end

# Usage in service initialization
class OrderService
  def self.start
    discovery = ServiceDiscovery.new
    discovery.register_service(
      'order-service',
      ENV['SERVICE_HOST'],
      ENV['SERVICE_PORT'].to_i,
      ['v1', 'production']
    )
    
    at_exit do
      discovery.deregister_service('order-service')
    end
  end
end

Integration & Interoperability

API Gateway Pattern: An API gateway provides a single entry point for clients, routing requests to appropriate backend services. The gateway handles cross-cutting concerns like authentication, rate limiting, request logging, and response caching. This pattern simplifies client implementation by consolidating multiple service endpoints behind a unified API but introduces a potential bottleneck and single point of failure.

Gateways transform backend service APIs into client-friendly interfaces. Mobile clients might receive simplified JSON responses while web clients get detailed representations. The gateway aggregates data from multiple services, reducing round trips. A product details page requiring data from catalog, pricing, inventory, and review services makes one gateway call instead of four service calls.

# API Gateway implementation
class APIGateway < Sinatra::Base
  before do
    authenticate_request
    check_rate_limit
  end
  
  get '/products/:id' do
    product_id = params[:id]
    
    # Aggregate data from multiple services
    product_data = fetch_with_fallback do
      catalog_client.get_product(product_id)
    end
    
    pricing_data = fetch_with_fallback(default: { price: nil }) do
      pricing_client.get_price(product_id)
    end
    
    inventory_data = fetch_with_fallback(default: { in_stock: false }) do
      inventory_client.check_stock(product_id)
    end
    
    {
      product: product_data,
      pricing: pricing_data,
      inventory: inventory_data
    }.to_json
  end
  
  private
  
  def fetch_with_fallback(default: nil, &block)
    circuit_breaker(&block)
  rescue CircuitOpenError, ServiceTimeout
    logger.warn("Service call failed, using fallback")
    default
  end
  
  def authenticate_request
    halt 401, { error: 'Unauthorized' }.to_json unless valid_api_key?
  end
  
  def check_rate_limit
    client_id = request.env['HTTP_X_CLIENT_ID']
    if rate_limiter.exceeded?(client_id)
      halt 429, { error: 'Rate limit exceeded' }.to_json
    end
  end
end

Service Mesh Architecture: Service meshes like Istio or Linkerd handle service-to-service communication through sidecar proxies deployed alongside each service instance. The mesh provides traffic management, security, and observability without requiring application code changes. Proxies intercept all network traffic, implementing retry logic, circuit breakers, load balancing, and mutual TLS authentication.

Service meshes standardize communication patterns across services written in different languages. A Ruby service and a Go service both benefit from mesh-provided circuit breakers without implementing language-specific libraries. The mesh collects distributed tracing data and metrics automatically, providing visibility into request flows across services.

Contract Testing: Contract tests verify that service providers implement the contracts that consumers expect. Pact generates contract tests from consumer expectations, then replays those expectations against the provider. This approach catches integration issues early without requiring running all services simultaneously.

# Pact consumer test
require 'pact/consumer/rspec'

RSpec.describe 'Customer Service Client' do
  let(:customer_client) { CustomerServiceClient.new('localhost', 1234) }
  
  describe 'get customer' do
    before do
      customer_service.given('customer 123 exists')
        .upon_receiving('a request for customer 123')
        .with(
          method: :get,
          path: '/customers/123',
          headers: { 'Accept' => 'application/json' }
        )
        .will_respond_with(
          status: 200,
          headers: { 'Content-Type' => 'application/json' },
          body: {
            id: 123,
            name: 'Jane Smith',
            email: 'jane@example.com'
          }
        )
    end
    
    it 'returns customer data' do
      customer = customer_client.get_customer(123)
      expect(customer.id).to eq(123)
      expect(customer.name).to eq('Jane Smith')
    end
  end
end

# Pact provider test
Pact.provider_states_for 'Customer Service Client' do
  provider_state 'customer 123 exists' do
    set_up do
      Customer.create!(id: 123, name: 'Jane Smith', email: 'jane@example.com')
    end
    
    tear_down do
      Customer.destroy_all
    end
  end
end

Data Synchronization Patterns: Services maintaining local copies of data from other services require synchronization mechanisms. Change Data Capture (CDC) streams database changes to interested services. Debezium monitors database transaction logs, publishing change events to Kafka topics. Services subscribe to relevant topics, updating local data stores.

Event-driven synchronization publishes domain events when data changes. A customer service publishes customer.updated events containing changed fields. Downstream services maintain read models by consuming these events. This approach provides more semantic information than CDC but requires publishers to identify which changes warrant events.

API Versioning in Service Ecosystems: Multiple services with interdependencies complicate versioning. A customer service v2 changing response formats affects order service, invoice service, and analytics service. Coordinating migrations across services requires planning.

Parallel version deployment runs v1 and v2 simultaneously, routing consumers to appropriate versions. This approach enables gradual consumer migration but doubles resource requirements and complicates deployment. Expand-contract pattern introduces changes in three phases: expand the API to support both old and new behavior, migrate consumers, then contract by removing old behavior.

Common Pitfalls

Chatty Service Communication: Fine-grained services requiring multiple network calls for single operations create latency and failure scenarios. A product details page calling catalog service, pricing service, inventory service, review service, and recommendation service sequentially accumulates network latency. Each call introduces potential failure points. The gateway pattern addresses this through backend aggregation, making parallel calls to services and combining results.

Services exposing overly granular operations force clients to orchestrate multiple calls. A user profile update requiring separate calls for name, email, phone, and address changes creates inefficient interactions. Designing coarser-grained operations that handle related changes together reduces round trips.

Distributed Data Management Failures: Attempting to maintain data consistency across services through distributed transactions rarely succeeds. Two-phase commit protocols require all participants to remain available throughout the transaction, violating availability requirements for distributed systems. Network partitions prevent reaching consensus, causing transactions to block.

The saga pattern with compensating transactions handles distributed workflows without distributed transactions. Each service executes local transactions, publishing events on completion. Subsequent services react to events. Failures trigger compensating transactions that semantically undo completed operations. An order placement saga reserves inventory, charges payment, and creates shipment. If payment fails, compensating transactions release inventory and cancel shipment.

# Saga pattern implementation
class OrderSaga
  def execute(order_params)
    order = nil
    inventory_reserved = false
    payment_charged = false
    
    begin
      # Step 1: Reserve inventory
      inventory_result = inventory_service.reserve(order_params[:items])
      inventory_reserved = true
      
      # Step 2: Charge payment
      payment_result = payment_service.charge(
        customer_id: order_params[:customer_id],
        amount: calculate_total(order_params[:items])
      )
      payment_charged = true
      
      # Step 3: Create order
      order = Order.create!(
        customer_id: order_params[:customer_id],
        payment_id: payment_result[:payment_id],
        inventory_reservation_id: inventory_result[:reservation_id]
      )
      
      order
    rescue => e
      # Execute compensating transactions
      if payment_charged
        payment_service.refund(payment_result[:payment_id])
      end
      
      if inventory_reserved
        inventory_service.release(inventory_result[:reservation_id])
      end
      
      raise SagaFailedError, "Order saga failed: #{e.message}"
    end
  end
end

Insufficient Service Boundaries: Services sharing databases or directly accessing each other's data stores violate service autonomy. An order service querying the customer database directly creates tight coupling, preventing the customer service from changing its schema independently. Services must communicate through defined interfaces, never through direct database access.

Shared libraries containing business logic create coupling across services. When the shared library changes, all dependent services require redeployment. Domain logic belongs within services, not shared libraries. Shared libraries should contain only truly generic utilities unrelated to specific business domains.

Overlooking Network Unreliability: Assuming network calls succeed or fail cleanly leads to data inconsistencies. Networks exhibit partial failures—requests might timeout after the service processed them, creating duplicate operations. Idempotency keys prevent duplicate processing. Services check if they previously processed a request with the same idempotency key before executing operations.

Services failing to implement timeouts block indefinitely waiting for responses, consuming threads and preventing failure detection. Every network call needs explicit timeout values appropriate for the operation. Read operations might timeout after 1 second while write operations allow 5 seconds.

Inadequate Service Documentation: Services without comprehensive documentation force consumers to read implementation code or experiment with endpoints to understand behavior. OpenAPI specifications document REST APIs, describing endpoints, parameters, response formats, and error codes. Maintaining specifications alongside code prevents documentation drift.

Service contracts must specify error scenarios. A payment service returning HTTP 400 might indicate invalid card numbers, expired cards, insufficient funds, or fraud detection. Detailed error codes and messages enable clients to handle errors appropriately.

Monitoring and Observability Gaps: Distributed systems obscure failures. A slow database query in one service cascades through dependent services, causing timeout errors elsewhere. Distributed tracing tracks requests across services, correlating logs and metrics. Each request receives a unique trace ID propagated through service calls.

# Distributed tracing with correlation IDs
class ApplicationController < ActionController::API
  before_action :set_correlation_id
  
  def set_correlation_id
    request_id = request.headers['X-Request-ID'] || SecureRandom.uuid
    Thread.current[:correlation_id] = request_id
    response.headers['X-Request-ID'] = request_id
  end
end

class ServiceClient
  def call_service(endpoint, data)
    response = @conn.post(endpoint) do |req|
      req.body = data
      req.headers['X-Request-ID'] = Thread.current[:correlation_id]
    end
    
    logger.info(
      "Service call completed",
      correlation_id: Thread.current[:correlation_id],
      endpoint: endpoint,
      duration_ms: response.env.total_time * 1000,
      status: response.status
    )
    
    response.body
  end
end

Services need health check endpoints returning service status and dependency health. A service reporting healthy while its database is unreachable misleads load balancers and monitoring systems. Health checks verify critical dependencies before reporting healthy status.

Cascading Failures: A single service failure spreading to dependent services characterizes cascading failures. Without circuit breakers, services continue calling failed dependencies, exhausting thread pools and preventing recovery. Circuit breakers stop calls to failing services, allowing them to recover without continued traffic.

Bulkhead patterns limit resource allocation per dependency. A service calling payment, shipping, and inventory services allocates separate thread pools for each dependency. If the payment service becomes slow, it exhausts only its allocated threads, preventing payment issues from blocking shipping and inventory operations.

Reference

SOA Design Principles

Principle Description Implementation Focus
Service Contract Formal agreement specifying operations and data formats WSDL, OpenAPI specifications, schema definitions
Loose Coupling Minimal dependencies on implementation details Message-based communication, interface abstractions
Service Abstraction Hidden implementation details from consumers Facade patterns, API gateways
Service Reusability Design for consumption by multiple applications Generic interfaces, parameterized operations
Service Autonomy Independent control over behavior and data Database per service, internal data ownership
Service Statelessness No client-specific state between requests Token-based authentication, external session stores
Service Discoverability Runtime service location through registries Consul, Eureka, Zookeeper integration
Service Composability Combination into higher-level processes Orchestration engines, choreography patterns

Communication Pattern Comparison

Pattern Coupling Latency Complexity Use Cases
Synchronous REST Medium Low-Medium Low Request-response scenarios, external APIs
Synchronous SOAP Medium-High Medium High Enterprise integration, transactional operations
Message Queues Low Medium-High Medium Asynchronous workflows, event distribution
Event Streaming Low Medium Medium-High Real-time data pipelines, audit trails
gRPC Medium Low Medium Internal service communication, high throughput

Data Management Strategies

Strategy Consistency Complexity Query Capability Autonomy
Shared Database Strong Low High Low
Database per Service Eventual Medium Limited High
Event Sourcing Eventual High Medium High
CQRS with Projections Eventual High High High
API Composition Depends Medium Medium High

Service Integration Patterns

Pattern Description Benefits Drawbacks
API Gateway Single entry point for client requests Simplified clients, centralized concerns Potential bottleneck, single point of failure
Service Mesh Infrastructure layer handling communication Language agnostic, automatic observability Operational complexity, resource overhead
Backend for Frontend Specialized backends per client type Client-optimized APIs, reduced coupling Potential duplication, more services
Saga Pattern Distributed transaction coordination No distributed locks, improved availability Complex error handling, eventual consistency
Circuit Breaker Prevent cascading failures Improved resilience, faster failure detection Configuration tuning, false positives

Error Handling Strategies

Error Type HTTP Status Handling Approach Client Action
Validation Error 422 Return detailed validation messages Fix input and retry
Resource Not Found 404 Return error with resource identifier Check resource exists
Authorization Failure 403 Return permission error Verify credentials and access
Service Unavailable 503 Return retry-after header Exponential backoff retry
Timeout 504 Log partial completion state Idempotent retry
Rate Limit Exceeded 429 Return limit reset time Wait and retry

Versioning Strategies

Strategy Implementation Pros Cons
URI Versioning /api/v1/resource, /api/v2/resource Clear, explicit, cacheable Proliferates endpoints
Header Versioning Accept: application/vnd.company.v2+json Clean URIs, flexible Less discoverable
Query Parameter /api/resource?version=2 Simple, backward compatible Breaks caching, unclear
Content Negotiation Media type with version RESTful, flexible Client complexity

Service Discovery Configuration

Component Purpose Example Tools
Service Registry Store service locations and metadata Consul, Eureka, Zookeeper
Health Checks Monitor service availability HTTP endpoints, TCP checks
Load Balancing Distribute requests across instances Client-side, proxy-based
DNS Integration Map service names to IP addresses Consul DNS, CoreDNS

Circuit Breaker States

State Behavior Transition Condition
Closed Requests pass through, failures counted Failure threshold reached
Open Requests immediately fail Timeout period expires
Half-Open Single request attempted Request succeeds or fails

Monitoring Metrics

Metric Description Alerting Threshold
Request Rate Requests per second Unusual deviation from baseline
Error Rate Failed requests percentage Greater than 1-5%
Latency Percentiles P50, P95, P99 response times P95 exceeds SLA
Saturation Resource utilization percentage Greater than 80%
Circuit Breaker State Open/closed status Open state duration
Queue Depth Pending messages count Exceeds capacity planning