Overview
Service-Oriented Architecture (SOA) structures applications as collections of loosely coupled services that communicate over a network. Each service encapsulates a specific business capability and exposes that functionality through a well-defined interface, independent of the underlying implementation technology.
SOA emerged in the late 1990s as organizations struggled with monolithic applications that became increasingly difficult to maintain and scale. The architecture promotes reusability by allowing different applications to consume the same services, reducing duplication across an organization. A customer validation service, for example, can serve web applications, mobile apps, and batch processing systems without modification.
The architecture operates on the principle that services remain independent units of functionality. When an e-commerce application needs to process a payment, it calls a payment service through a defined contract. The calling application does not need to know whether the payment service uses a SQL database, connects to external payment processors, or runs on a specific server. This separation enables teams to develop, deploy, and scale services independently.
SOA differs from microservices in scope and granularity. SOA services typically represent larger business domains with multiple operations exposed through a single service interface. A customer service in SOA might include operations for creating customers, updating addresses, validating credit, and retrieving purchase history. Microservices generally focus on smaller, single-purpose capabilities.
# SOA service example - Customer service with multiple operations
class CustomerService
def create_customer(data)
# Handle customer creation
customer = Customer.create!(data)
publish_event('customer.created', customer.to_h)
customer
end
def update_address(customer_id, address_data)
# Address update logic
customer = Customer.find(customer_id)
customer.update_address!(address_data)
customer
end
def validate_credit(customer_id)
# Credit validation logic
CreditValidator.check(customer_id)
end
end
The architecture introduces complexity through network communication, service discovery, and distributed data management. Applications must handle network failures, implement service versioning, and coordinate transactions across multiple services. Organizations adopt SOA when the benefits of service reusability, independent scalability, and technology flexibility outweigh these operational challenges.
Key Principles
Service Contract: Each service defines a formal contract specifying operations, input parameters, output formats, and error conditions. The contract serves as an agreement between service providers and consumers. Changes to contracts require versioning strategies to prevent breaking existing consumers. Contracts typically use standards like WSDL for SOAP services or OpenAPI specifications for REST APIs.
Loose Coupling: Services minimize dependencies on other services' internal implementations. A service consumer should only depend on the service contract, not on implementation details like database schemas or internal class structures. This principle allows teams to refactor service internals without affecting consumers. Message-based communication patterns promote loose coupling by eliminating direct service-to-service calls.
Service Abstraction: Services hide implementation details from consumers. A payment processing service exposes operations like process_payment and refund_payment without revealing whether it uses Stripe, PayPal, or an internal payment system. This abstraction enables service providers to swap implementations or optimize internal logic without consumer impact.
Service Reusability: Services provide functionality that multiple applications can consume. An address validation service can serve web applications, mobile apps, and batch processing systems. Designing for reusability requires careful consideration of service granularity and interface design to meet diverse consumer needs without creating overly generic, difficult-to-use APIs.
Service Autonomy: Services control their own behavior and data. Each service owns its database schema and business logic without external systems directly manipulating its data. A customer service manages the customer database exclusively. Other services interact with customer data only through the customer service API, maintaining data integrity and enabling independent service evolution.
Service Statelessness: Services avoid maintaining client-specific state between requests. Each request contains all information needed to process it. Stateless services scale horizontally more easily since any service instance can handle any request. Session state, when required, moves to external stores like Redis or client-side tokens rather than service memory.
Service Discoverability: Services register with a service registry, enabling clients to locate services dynamically at runtime. Service registries like Consul, Eureka, or Zookeeper maintain service locations, health status, and metadata. Clients query the registry to find available service instances rather than hardcoding service addresses.
# Service registry interaction example
class ServiceRegistry
def register(service_name, host, port, metadata = {})
@registry ||= {}
@registry[service_name] ||= []
@registry[service_name] << {
host: host,
port: port,
metadata: metadata,
registered_at: Time.now
}
end
def discover(service_name)
instances = @registry[service_name] || []
instances.select { |i| healthy?(i) }
end
def healthy?(instance)
# Health check logic
TCPSocket.new(instance[:host], instance[:port]).close
true
rescue
false
end
end
Service Composability: Services combine to form higher-level business processes. An order processing workflow might invoke inventory, payment, and shipping services in sequence. Service orchestration manages these combinations, coordinating calls and handling failures. Choreography patterns distribute coordination logic across services through event-driven communication.
Design Considerations
Granularity Decisions: Service granularity significantly impacts architecture complexity and performance. Coarse-grained services encapsulate larger business domains, reducing network overhead and simplifying service management but potentially limiting reusability and independent scaling. Fine-grained services enable precise scaling and reuse but increase operational complexity and inter-service communication costs.
An e-commerce platform choosing between a single order service or separate services for cart management, checkout, and fulfillment exemplifies this trade-off. A unified order service simplifies deployment and reduces network calls but prevents independent scaling of high-volume cart operations from lower-volume fulfillment processes. The decision depends on traffic patterns, team structure, and operational capabilities.
Communication Protocol Selection: SOAP provides strong typing, built-in error handling, and comprehensive WS-* standards for security and transactions but introduces XML parsing overhead and complex tooling requirements. REST offers simplicity and widespread adoption with HTTP semantics but lacks standardized approaches for complex operations, transactions, and event notification.
Message queues like RabbitMQ or Apache Kafka enable asynchronous communication, improving resilience and enabling event-driven architectures. Queue-based communication decouples services temporally—consumers need not be available when producers send messages. This pattern suits workflows tolerating eventual consistency but complicates request-response scenarios requiring immediate results.
# Protocol comparison - REST endpoint
class OrdersController < ApplicationController
def create
order = OrderService.create_order(params[:order])
render json: order, status: :created
rescue OrderService::ValidationError => e
render json: { error: e.message }, status: :unprocessable_entity
end
end
# Message queue approach
class OrderProcessor
def process(message)
order_data = JSON.parse(message.payload)
order = OrderService.create_order(order_data)
# Publish event for downstream services
queue.publish('order.created', order.to_json)
rescue => e
# Handle async errors differently
ErrorNotifier.report(e)
message.nack # Return to queue for retry
end
end
Data Management Strategies: Shared database approaches violate service autonomy but simplify queries spanning multiple services. Database-per-service patterns enforce service boundaries and enable technology diversity but complicate cross-service queries and transactions. Implementing reports requiring customer, order, and inventory data becomes challenging when each service owns its database.
The saga pattern addresses distributed transactions by breaking them into local transactions with compensating actions for rollback. An order placement saga might reserve inventory, charge payment, and create shipment records as separate local transactions. If payment fails, the saga executes compensating transactions to release inventory and cancel the shipment. This approach trades ACID properties for availability and partition tolerance.
Service Versioning Approaches: URI versioning (/api/v1/orders, /api/v2/orders) clearly segregates versions but proliferates endpoints and complicates routing. Header-based versioning (Accept: application/vnd.company.v2+json) keeps URIs stable but makes versions less discoverable. Content negotiation through media types enables gradual migration but requires consumer sophistication.
Breaking changes require maintaining multiple service versions simultaneously. A payment service changing its charge API from accepting card details to requiring payment tokens must support both approaches during a transition period. The duration depends on how quickly consumers can migrate—internal consumers might migrate within weeks while external API consumers might require months of notice.
Failure Mode Planning: Services must handle partial failures gracefully. When a recommendation service fails, an e-commerce site should display products without recommendations rather than showing error pages. Circuit breakers prevent cascading failures by stopping calls to failing services after a threshold of failures. Bulkheads isolate failures by limiting resources allocated to each service dependency.
# Circuit breaker pattern
class CircuitBreaker
FAILURE_THRESHOLD = 5
TIMEOUT_DURATION = 60
def initialize(service_name)
@service_name = service_name
@failures = 0
@last_failure_time = nil
@state = :closed
end
def call(&block)
case @state
when :open
raise CircuitOpenError if Time.now - @last_failure_time < TIMEOUT_DURATION
@state = :half_open
attempt_call(&block)
when :half_open
attempt_call(&block)
else
attempt_call(&block)
end
end
private
def attempt_call(&block)
result = block.call
reset_failures
result
rescue => e
record_failure
raise
end
def record_failure
@failures += 1
@last_failure_time = Time.now
@state = :open if @failures >= FAILURE_THRESHOLD
end
def reset_failures
@failures = 0
@state = :closed
end
end
Implementation Approaches
SOAP-Based SOA: SOAP services use XML for message formatting and WSDL for service contracts. The approach provides strong typing through XML Schema, standardized error handling with SOAP faults, and extensive WS-* specifications covering security (WS-Security), transactions (WS-AtomicTransaction), and reliable messaging (WS-ReliableMessaging). SOAP excels in enterprise environments requiring formal contracts and complex transaction coordination but introduces significant overhead for simple request-response scenarios.
The WSDL-first approach defines service contracts before implementation, generating code from WSDL specifications. This strategy ensures contract stability and enables contract-based testing but requires WSDL expertise and can produce awkward APIs if the contract poorly maps to the implementation language. Code-first approaches generate WSDL from implementation but risk producing contracts tightly coupled to implementation details.
REST-Based SOA: REST services expose resources through HTTP operations (GET, POST, PUT, DELETE) with JSON or XML representations. REST leverages HTTP infrastructure—caching, authentication, content negotiation—reducing the need for custom protocols. The approach suits public APIs and simple service interactions but lacks standardization for complex scenarios like bulk operations, transactions, and event subscription.
Richardson Maturity Model levels indicate REST sophistication. Level 0 uses HTTP as a transport tunnel. Level 1 introduces resource URIs. Level 2 adds HTTP verbs and status codes. Level 3 implements hypermedia controls (HATEOAS), embedding navigation links in responses to guide clients through available operations. Most REST APIs operate at Level 2.
# REST service implementation with Sinatra
require 'sinatra/base'
require 'json'
class CustomerService < Sinatra::Base
before do
content_type :json
end
get '/customers/:id' do
customer = Customer.find(params[:id])
customer.to_json
rescue ActiveRecord::RecordNotFound
status 404
{ error: 'Customer not found' }.to_json
end
post '/customers' do
data = JSON.parse(request.body.read)
customer = Customer.create!(data)
status 201
customer.to_json
rescue ActiveRecord::RecordInvalid => e
status 422
{ error: e.message }.to_json
end
put '/customers/:id' do
customer = Customer.find(params[:id])
data = JSON.parse(request.body.read)
customer.update!(data)
customer.to_json
end
delete '/customers/:id' do
Customer.find(params[:id]).destroy
status 204
end
end
Event-Driven SOA: Services communicate through events rather than direct calls. A service publishes events when significant state changes occur; interested services subscribe to relevant events. This approach decouples services temporally and reduces dependencies but introduces eventual consistency and complicates understanding system behavior through distributed event flows.
Event sourcing stores events as the primary data model rather than current state. An order service stores order placement, payment, and shipment events rather than final order state. Current state derives from replaying events. This pattern provides complete audit trails and enables temporal queries but increases storage requirements and complicates simple queries.
Enterprise Service Bus (ESB): An ESB centralizes integration logic, routing messages between services, transforming data formats, and managing service orchestration. The bus handles protocol translation, enabling SOAP services to communicate with REST endpoints or message queues. ESBs simplify adding new services by connecting them to the bus rather than point-to-point integrations but create a single point of failure and potential performance bottleneck.
Smart endpoints with dumb pipes invert this model. Services contain business logic and integration requirements while the bus provides only message routing. This approach reduces bus complexity and coupling but distributes integration logic across services, potentially causing duplication.
Ruby Implementation
Service Implementation with Grape: Grape provides a DSL for building REST APIs with automatic parameter validation, versioning support, and content negotiation. The framework suits service implementation when requirements include parameter coercion, entity presentation, and API versioning.
require 'grape'
class OrderAPI < Grape::API
version 'v1', using: :header, vendor: 'company'
format :json
prefix :api
resource :orders do
desc 'Create a new order'
params do
requires :customer_id, type: Integer
requires :items, type: Array do
requires :product_id, type: Integer
requires :quantity, type: Integer
end
optional :shipping_address, type: Hash do
requires :street, type: String
requires :city, type: String
requires :postal_code, type: String
end
end
post do
order = OrderService.create_order(
customer_id: params[:customer_id],
items: params[:items],
shipping_address: params[:shipping_address]
)
present order, with: OrderEntity
end
desc 'Retrieve order by ID'
params do
requires :id, type: Integer
end
route_param :id do
get do
order = Order.find(params[:id])
present order, with: OrderEntity
end
end
end
rescue_from ActiveRecord::RecordNotFound do
error!({ error: 'Resource not found' }, 404)
end
rescue_from Grape::Exceptions::ValidationErrors do |e|
error!({ errors: e.full_messages }, 422)
end
end
class OrderEntity < Grape::Entity
expose :id
expose :customer_id
expose :total_amount
expose :status
expose :created_at
expose :items, using: OrderItemEntity
end
Service Client Implementation: HTTP client libraries like HTTParty or Faraday handle service communication. Implementing retry logic, timeout management, and circuit breakers requires additional patterns around basic HTTP calls.
require 'faraday'
require 'faraday_middleware'
class PaymentServiceClient
BASE_URL = ENV['PAYMENT_SERVICE_URL']
def initialize
@conn = Faraday.new(url: BASE_URL) do |f|
f.request :json
f.response :json, content_type: /\bjson$/
f.adapter Faraday.default_adapter
f.options.timeout = 5
f.options.open_timeout = 2
end
end
def process_payment(payment_data)
response = @conn.post('/payments') do |req|
req.body = payment_data
req.headers['X-API-Key'] = ENV['PAYMENT_API_KEY']
end
case response.status
when 200..299
response.body
when 422
raise ValidationError, response.body['error']
when 500..599
raise ServiceError, "Payment service error: #{response.status}"
else
raise UnexpectedError, "Unexpected status: #{response.status}"
end
rescue Faraday::TimeoutError
raise ServiceTimeout, "Payment service timeout"
rescue Faraday::ConnectionFailed => e
raise ServiceUnavailable, "Payment service unavailable: #{e.message}"
end
class ValidationError < StandardError; end
class ServiceError < StandardError; end
class ServiceTimeout < StandardError; end
class ServiceUnavailable < StandardError; end
class UnexpectedError < StandardError; end
end
Message Queue Integration with Bunny: Bunny provides a Ruby client for RabbitMQ, enabling asynchronous service communication through message queues.
require 'bunny'
require 'json'
class OrderEventPublisher
def initialize
@connection = Bunny.new(ENV['RABBITMQ_URL'])
@connection.start
@channel = @connection.create_channel
@exchange = @channel.topic('orders', durable: true)
end
def publish_order_created(order)
message = {
event_type: 'order.created',
order_id: order.id,
customer_id: order.customer_id,
total_amount: order.total_amount,
timestamp: Time.now.iso8601
}
@exchange.publish(
message.to_json,
routing_key: 'order.created',
persistent: true,
content_type: 'application/json'
)
end
def close
@channel.close
@connection.close
end
end
class InventoryService
def start_consuming
connection = Bunny.new(ENV['RABBITMQ_URL'])
connection.start
channel = connection.create_channel
exchange = channel.topic('orders', durable: true)
queue = channel.queue('inventory.orders', durable: true)
queue.bind(exchange, routing_key: 'order.created')
queue.subscribe(manual_ack: true, block: true) do |delivery_info, properties, body|
process_order_event(JSON.parse(body))
channel.ack(delivery_info.delivery_tag)
rescue => e
logger.error("Failed to process order event: #{e.message}")
channel.nack(delivery_info.delivery_tag, false, true)
end
end
def process_order_event(event)
order_id = event['order_id']
# Reserve inventory for order
InventoryReservation.create!(order_id: order_id)
end
end
Service Discovery with Consul: Consul provides service registration and discovery. Ruby services register themselves on startup and query Consul to locate other services.
require 'diplomat'
class ServiceDiscovery
def register_service(name, host, port, tags = [])
Diplomat::Service.register(
Name: name,
Address: host,
Port: port,
Tags: tags,
Check: {
HTTP: "http://#{host}:#{port}/health",
Interval: "10s"
}
)
end
def discover_service(name)
services = Diplomat::Service.get(name, :passing)
services.map do |service|
{
host: service.ServiceAddress,
port: service.ServicePort,
tags: service.ServiceTags
}
end
rescue Diplomat::KeyNotFound
[]
end
def deregister_service(service_id)
Diplomat::Service.deregister(service_id)
end
end
# Usage in service initialization
class OrderService
def self.start
discovery = ServiceDiscovery.new
discovery.register_service(
'order-service',
ENV['SERVICE_HOST'],
ENV['SERVICE_PORT'].to_i,
['v1', 'production']
)
at_exit do
discovery.deregister_service('order-service')
end
end
end
Integration & Interoperability
API Gateway Pattern: An API gateway provides a single entry point for clients, routing requests to appropriate backend services. The gateway handles cross-cutting concerns like authentication, rate limiting, request logging, and response caching. This pattern simplifies client implementation by consolidating multiple service endpoints behind a unified API but introduces a potential bottleneck and single point of failure.
Gateways transform backend service APIs into client-friendly interfaces. Mobile clients might receive simplified JSON responses while web clients get detailed representations. The gateway aggregates data from multiple services, reducing round trips. A product details page requiring data from catalog, pricing, inventory, and review services makes one gateway call instead of four service calls.
# API Gateway implementation
class APIGateway < Sinatra::Base
before do
authenticate_request
check_rate_limit
end
get '/products/:id' do
product_id = params[:id]
# Aggregate data from multiple services
product_data = fetch_with_fallback do
catalog_client.get_product(product_id)
end
pricing_data = fetch_with_fallback(default: { price: nil }) do
pricing_client.get_price(product_id)
end
inventory_data = fetch_with_fallback(default: { in_stock: false }) do
inventory_client.check_stock(product_id)
end
{
product: product_data,
pricing: pricing_data,
inventory: inventory_data
}.to_json
end
private
def fetch_with_fallback(default: nil, &block)
circuit_breaker(&block)
rescue CircuitOpenError, ServiceTimeout
logger.warn("Service call failed, using fallback")
default
end
def authenticate_request
halt 401, { error: 'Unauthorized' }.to_json unless valid_api_key?
end
def check_rate_limit
client_id = request.env['HTTP_X_CLIENT_ID']
if rate_limiter.exceeded?(client_id)
halt 429, { error: 'Rate limit exceeded' }.to_json
end
end
end
Service Mesh Architecture: Service meshes like Istio or Linkerd handle service-to-service communication through sidecar proxies deployed alongside each service instance. The mesh provides traffic management, security, and observability without requiring application code changes. Proxies intercept all network traffic, implementing retry logic, circuit breakers, load balancing, and mutual TLS authentication.
Service meshes standardize communication patterns across services written in different languages. A Ruby service and a Go service both benefit from mesh-provided circuit breakers without implementing language-specific libraries. The mesh collects distributed tracing data and metrics automatically, providing visibility into request flows across services.
Contract Testing: Contract tests verify that service providers implement the contracts that consumers expect. Pact generates contract tests from consumer expectations, then replays those expectations against the provider. This approach catches integration issues early without requiring running all services simultaneously.
# Pact consumer test
require 'pact/consumer/rspec'
RSpec.describe 'Customer Service Client' do
let(:customer_client) { CustomerServiceClient.new('localhost', 1234) }
describe 'get customer' do
before do
customer_service.given('customer 123 exists')
.upon_receiving('a request for customer 123')
.with(
method: :get,
path: '/customers/123',
headers: { 'Accept' => 'application/json' }
)
.will_respond_with(
status: 200,
headers: { 'Content-Type' => 'application/json' },
body: {
id: 123,
name: 'Jane Smith',
email: 'jane@example.com'
}
)
end
it 'returns customer data' do
customer = customer_client.get_customer(123)
expect(customer.id).to eq(123)
expect(customer.name).to eq('Jane Smith')
end
end
end
# Pact provider test
Pact.provider_states_for 'Customer Service Client' do
provider_state 'customer 123 exists' do
set_up do
Customer.create!(id: 123, name: 'Jane Smith', email: 'jane@example.com')
end
tear_down do
Customer.destroy_all
end
end
end
Data Synchronization Patterns: Services maintaining local copies of data from other services require synchronization mechanisms. Change Data Capture (CDC) streams database changes to interested services. Debezium monitors database transaction logs, publishing change events to Kafka topics. Services subscribe to relevant topics, updating local data stores.
Event-driven synchronization publishes domain events when data changes. A customer service publishes customer.updated events containing changed fields. Downstream services maintain read models by consuming these events. This approach provides more semantic information than CDC but requires publishers to identify which changes warrant events.
API Versioning in Service Ecosystems: Multiple services with interdependencies complicate versioning. A customer service v2 changing response formats affects order service, invoice service, and analytics service. Coordinating migrations across services requires planning.
Parallel version deployment runs v1 and v2 simultaneously, routing consumers to appropriate versions. This approach enables gradual consumer migration but doubles resource requirements and complicates deployment. Expand-contract pattern introduces changes in three phases: expand the API to support both old and new behavior, migrate consumers, then contract by removing old behavior.
Common Pitfalls
Chatty Service Communication: Fine-grained services requiring multiple network calls for single operations create latency and failure scenarios. A product details page calling catalog service, pricing service, inventory service, review service, and recommendation service sequentially accumulates network latency. Each call introduces potential failure points. The gateway pattern addresses this through backend aggregation, making parallel calls to services and combining results.
Services exposing overly granular operations force clients to orchestrate multiple calls. A user profile update requiring separate calls for name, email, phone, and address changes creates inefficient interactions. Designing coarser-grained operations that handle related changes together reduces round trips.
Distributed Data Management Failures: Attempting to maintain data consistency across services through distributed transactions rarely succeeds. Two-phase commit protocols require all participants to remain available throughout the transaction, violating availability requirements for distributed systems. Network partitions prevent reaching consensus, causing transactions to block.
The saga pattern with compensating transactions handles distributed workflows without distributed transactions. Each service executes local transactions, publishing events on completion. Subsequent services react to events. Failures trigger compensating transactions that semantically undo completed operations. An order placement saga reserves inventory, charges payment, and creates shipment. If payment fails, compensating transactions release inventory and cancel shipment.
# Saga pattern implementation
class OrderSaga
def execute(order_params)
order = nil
inventory_reserved = false
payment_charged = false
begin
# Step 1: Reserve inventory
inventory_result = inventory_service.reserve(order_params[:items])
inventory_reserved = true
# Step 2: Charge payment
payment_result = payment_service.charge(
customer_id: order_params[:customer_id],
amount: calculate_total(order_params[:items])
)
payment_charged = true
# Step 3: Create order
order = Order.create!(
customer_id: order_params[:customer_id],
payment_id: payment_result[:payment_id],
inventory_reservation_id: inventory_result[:reservation_id]
)
order
rescue => e
# Execute compensating transactions
if payment_charged
payment_service.refund(payment_result[:payment_id])
end
if inventory_reserved
inventory_service.release(inventory_result[:reservation_id])
end
raise SagaFailedError, "Order saga failed: #{e.message}"
end
end
end
Insufficient Service Boundaries: Services sharing databases or directly accessing each other's data stores violate service autonomy. An order service querying the customer database directly creates tight coupling, preventing the customer service from changing its schema independently. Services must communicate through defined interfaces, never through direct database access.
Shared libraries containing business logic create coupling across services. When the shared library changes, all dependent services require redeployment. Domain logic belongs within services, not shared libraries. Shared libraries should contain only truly generic utilities unrelated to specific business domains.
Overlooking Network Unreliability: Assuming network calls succeed or fail cleanly leads to data inconsistencies. Networks exhibit partial failures—requests might timeout after the service processed them, creating duplicate operations. Idempotency keys prevent duplicate processing. Services check if they previously processed a request with the same idempotency key before executing operations.
Services failing to implement timeouts block indefinitely waiting for responses, consuming threads and preventing failure detection. Every network call needs explicit timeout values appropriate for the operation. Read operations might timeout after 1 second while write operations allow 5 seconds.
Inadequate Service Documentation: Services without comprehensive documentation force consumers to read implementation code or experiment with endpoints to understand behavior. OpenAPI specifications document REST APIs, describing endpoints, parameters, response formats, and error codes. Maintaining specifications alongside code prevents documentation drift.
Service contracts must specify error scenarios. A payment service returning HTTP 400 might indicate invalid card numbers, expired cards, insufficient funds, or fraud detection. Detailed error codes and messages enable clients to handle errors appropriately.
Monitoring and Observability Gaps: Distributed systems obscure failures. A slow database query in one service cascades through dependent services, causing timeout errors elsewhere. Distributed tracing tracks requests across services, correlating logs and metrics. Each request receives a unique trace ID propagated through service calls.
# Distributed tracing with correlation IDs
class ApplicationController < ActionController::API
before_action :set_correlation_id
def set_correlation_id
request_id = request.headers['X-Request-ID'] || SecureRandom.uuid
Thread.current[:correlation_id] = request_id
response.headers['X-Request-ID'] = request_id
end
end
class ServiceClient
def call_service(endpoint, data)
response = @conn.post(endpoint) do |req|
req.body = data
req.headers['X-Request-ID'] = Thread.current[:correlation_id]
end
logger.info(
"Service call completed",
correlation_id: Thread.current[:correlation_id],
endpoint: endpoint,
duration_ms: response.env.total_time * 1000,
status: response.status
)
response.body
end
end
Services need health check endpoints returning service status and dependency health. A service reporting healthy while its database is unreachable misleads load balancers and monitoring systems. Health checks verify critical dependencies before reporting healthy status.
Cascading Failures: A single service failure spreading to dependent services characterizes cascading failures. Without circuit breakers, services continue calling failed dependencies, exhausting thread pools and preventing recovery. Circuit breakers stop calls to failing services, allowing them to recover without continued traffic.
Bulkhead patterns limit resource allocation per dependency. A service calling payment, shipping, and inventory services allocates separate thread pools for each dependency. If the payment service becomes slow, it exhausts only its allocated threads, preventing payment issues from blocking shipping and inventory operations.
Reference
SOA Design Principles
| Principle | Description | Implementation Focus |
|---|---|---|
| Service Contract | Formal agreement specifying operations and data formats | WSDL, OpenAPI specifications, schema definitions |
| Loose Coupling | Minimal dependencies on implementation details | Message-based communication, interface abstractions |
| Service Abstraction | Hidden implementation details from consumers | Facade patterns, API gateways |
| Service Reusability | Design for consumption by multiple applications | Generic interfaces, parameterized operations |
| Service Autonomy | Independent control over behavior and data | Database per service, internal data ownership |
| Service Statelessness | No client-specific state between requests | Token-based authentication, external session stores |
| Service Discoverability | Runtime service location through registries | Consul, Eureka, Zookeeper integration |
| Service Composability | Combination into higher-level processes | Orchestration engines, choreography patterns |
Communication Pattern Comparison
| Pattern | Coupling | Latency | Complexity | Use Cases |
|---|---|---|---|---|
| Synchronous REST | Medium | Low-Medium | Low | Request-response scenarios, external APIs |
| Synchronous SOAP | Medium-High | Medium | High | Enterprise integration, transactional operations |
| Message Queues | Low | Medium-High | Medium | Asynchronous workflows, event distribution |
| Event Streaming | Low | Medium | Medium-High | Real-time data pipelines, audit trails |
| gRPC | Medium | Low | Medium | Internal service communication, high throughput |
Data Management Strategies
| Strategy | Consistency | Complexity | Query Capability | Autonomy |
|---|---|---|---|---|
| Shared Database | Strong | Low | High | Low |
| Database per Service | Eventual | Medium | Limited | High |
| Event Sourcing | Eventual | High | Medium | High |
| CQRS with Projections | Eventual | High | High | High |
| API Composition | Depends | Medium | Medium | High |
Service Integration Patterns
| Pattern | Description | Benefits | Drawbacks |
|---|---|---|---|
| API Gateway | Single entry point for client requests | Simplified clients, centralized concerns | Potential bottleneck, single point of failure |
| Service Mesh | Infrastructure layer handling communication | Language agnostic, automatic observability | Operational complexity, resource overhead |
| Backend for Frontend | Specialized backends per client type | Client-optimized APIs, reduced coupling | Potential duplication, more services |
| Saga Pattern | Distributed transaction coordination | No distributed locks, improved availability | Complex error handling, eventual consistency |
| Circuit Breaker | Prevent cascading failures | Improved resilience, faster failure detection | Configuration tuning, false positives |
Error Handling Strategies
| Error Type | HTTP Status | Handling Approach | Client Action |
|---|---|---|---|
| Validation Error | 422 | Return detailed validation messages | Fix input and retry |
| Resource Not Found | 404 | Return error with resource identifier | Check resource exists |
| Authorization Failure | 403 | Return permission error | Verify credentials and access |
| Service Unavailable | 503 | Return retry-after header | Exponential backoff retry |
| Timeout | 504 | Log partial completion state | Idempotent retry |
| Rate Limit Exceeded | 429 | Return limit reset time | Wait and retry |
Versioning Strategies
| Strategy | Implementation | Pros | Cons |
|---|---|---|---|
| URI Versioning | /api/v1/resource, /api/v2/resource | Clear, explicit, cacheable | Proliferates endpoints |
| Header Versioning | Accept: application/vnd.company.v2+json | Clean URIs, flexible | Less discoverable |
| Query Parameter | /api/resource?version=2 | Simple, backward compatible | Breaks caching, unclear |
| Content Negotiation | Media type with version | RESTful, flexible | Client complexity |
Service Discovery Configuration
| Component | Purpose | Example Tools |
|---|---|---|
| Service Registry | Store service locations and metadata | Consul, Eureka, Zookeeper |
| Health Checks | Monitor service availability | HTTP endpoints, TCP checks |
| Load Balancing | Distribute requests across instances | Client-side, proxy-based |
| DNS Integration | Map service names to IP addresses | Consul DNS, CoreDNS |
Circuit Breaker States
| State | Behavior | Transition Condition |
|---|---|---|
| Closed | Requests pass through, failures counted | Failure threshold reached |
| Open | Requests immediately fail | Timeout period expires |
| Half-Open | Single request attempted | Request succeeds or fails |
Monitoring Metrics
| Metric | Description | Alerting Threshold |
|---|---|---|
| Request Rate | Requests per second | Unusual deviation from baseline |
| Error Rate | Failed requests percentage | Greater than 1-5% |
| Latency Percentiles | P50, P95, P99 response times | P95 exceeds SLA |
| Saturation | Resource utilization percentage | Greater than 80% |
| Circuit Breaker State | Open/closed status | Open state duration |
| Queue Depth | Pending messages count | Exceeds capacity planning |