CrackedRuby CrackedRuby

Overview

Data Vault Modeling provides a database design methodology for enterprise data warehouses that addresses limitations in traditional dimensional and normalized approaches. Dan Linstedt developed this methodology in the early 2000s to handle large-scale data warehouse environments where business requirements change frequently and data lineage matters.

The model divides data warehouse entities into three core types: Hubs store unique business keys, Links capture relationships between business entities, and Satellites hold descriptive attributes with complete temporal history. This separation creates an insert-only architecture where no data gets updated or deleted, providing complete auditability and simplifying parallel loading operations.

Data Vault differs from Kimball's dimensional modeling by avoiding aggregation and denormalization at the base layer. Unlike third normal form, it optimizes for loading speed rather than query performance, accepting query complexity in exchange for flexibility and maintainability. The methodology scales horizontally across multiple loading processes and adapts to changing business requirements without restructuring existing tables.

Hub_Customer                Link_Customer_Order              Sat_Customer
├── customer_hk            ├── customer_order_hk            ├── customer_hk
├── customer_id            ├── customer_hk                  ├── load_date
├── load_date              ├── order_hk                     ├── customer_name
└── record_source          ├── load_date                    ├── email
                           └── record_source                 ├── phone
                                                             └── record_source

Organizations implement Data Vault when data provenance requirements exceed what dimensional models provide, when source systems change frequently, or when parallel ETL processes need to load data without coordination. Financial services, healthcare, and telecommunications industries commonly adopt Data Vault for regulatory compliance and complex integration scenarios.

Key Principles

The Data Vault architecture separates concerns into three entity types, each serving a distinct purpose in the data warehouse structure. Hubs represent core business concepts and store only business keys with minimal metadata. Each Hub contains a surrogate hash key, the natural business key from source systems, load timestamp, and record source identifier. Hubs never contain descriptive attributes or foreign keys to other Hubs.

Links model relationships between business entities by storing the hash keys of connected Hubs. A Link represents an association, transaction, or hierarchy between two or more Hubs. Links contain their own hash key derived from the combination of Hub keys they connect, making relationship history independently trackable. Multi-way Links connect more than two Hubs when business relationships involve multiple entities simultaneously.

Satellites attach to Hubs or Links and store all descriptive attributes with complete temporal history. Each Satellite record includes the parent entity's hash key, a load date serving as part of the primary key, and all descriptive attributes valid at that point in time. When source data changes, the system inserts a new Satellite record rather than updating existing ones, creating a complete audit trail of all attribute changes over time.

Hash keys replace traditional surrogate keys throughout the Data Vault structure. The system generates hash keys by applying MD5 or SHA-256 algorithms to business key values, creating deterministic identifiers that remain consistent across ETL runs. Hash keys eliminate sequence generation bottlenecks and enable parallel loading without coordination, since multiple processes generate identical hash keys for the same business keys.

The insert-only principle prohibits updates and deletes in the Data Vault layer. All changes manifest as new inserts with later timestamps, preserving complete history without special change-tracking mechanisms. This approach simplifies ETL logic, improves loading performance, and provides inherent disaster recovery since data never gets overwritten.

Metadata fields appear in every Data Vault entity: load date timestamps record when data arrived in the warehouse, record source identifies the originating system, and applied date distinguishes when events occurred versus when they arrived. These fields enable data lineage tracking, multi-source integration, and temporal queries without complex versioning schemes.

# Hash key generation for a Hub
require 'digest'

def generate_hash_key(*business_keys)
  # Concatenate business keys with delimiter
  key_string = business_keys.map(&:to_s).join('||')
  
  # Generate MD5 hash
  Digest::MD5.hexdigest(key_string.upcase)
end

customer_hk = generate_hash_key('CUST-12345')
# => "7c9a4e8f2b1d3a5e6c8f0d2a4b6c8e0f"

order_hk = generate_hash_key('ORD-98765')
# => "3f1a9b7c5e2d4a6c8e0f2b4d6a8c0e2f"

customer_order_hk = generate_hash_key('CUST-12345', 'ORD-98765')
# => "9d2e6f8a0c4b2e6a8c0e4f6a8c2e0f4b"

The Business Vault layer extends Raw Vault structures with calculated fields, business rules, and derived relationships. While Raw Vault mirrors source systems exactly, Business Vault adds interpretation and enrichment without modifying the raw data layer. Reference tables, code translations, and computed metrics reside in Business Vault Satellites rather than the Raw Vault.

Implementation Approaches

Data Vault implementations follow a layered architecture pattern with distinct zones serving different purposes. The Staging Area receives raw data from source systems exactly as extracted, holding data temporarily before transformation. The Raw Data Vault persists all source data in Data Vault structures without business interpretation. The Business Data Vault adds calculated fields, reference data joins, and business rule applications. Information Marts or virtualization layers present data to end users in queryable formats.

Single-tenant implementations dedicate separate Schema or database instances to each data domain, isolating customer data physically. This approach simplifies security, enables independent scaling, and supports different retention policies per domain. The trade-off involves higher administrative overhead and difficulty executing cross-domain queries without federation.

Multi-tenant architectures share physical Data Vault structures across multiple business units or customers, using tenant identifiers in record source fields. This pattern reduces infrastructure costs and simplifies cross-tenant analysis but requires careful security controls and query predicate enforcement to prevent data leakage.

# Multi-tenant Hub structure
class HubCustomer
  include DataVault::Hub
  
  hash_key :customer_hk
  business_key :customer_id
  tenant_id :tenant_id
  
  validate :unique_within_tenant
  
  def unique_within_tenant
    existing = self.class.where(
      customer_id: customer_id,
      tenant_id: tenant_id
    ).where.not(customer_hk: customer_hk)
    
    errors.add(:customer_id, 'already exists in tenant') if existing.exists?
  end
end

Initial loading processes populate the Data Vault from historical source data, requiring special handling for temporal reconstruction. The ETL extracts all available history, generates hash keys consistently, and loads Hubs before Links and Satellites to satisfy referential dependencies. Initial loads often process data in parallel batches partitioned by business key ranges or source system identifiers.

Incremental loading handles ongoing changes from source systems, typically running on scheduled intervals. Change data capture (CDC) mechanisms identify modified records since the last extraction, though full table comparisons work for smaller sources. The load process generates hash keys, checks for Hub existence, creates Links for new relationships, and inserts Satellite records when attributes change.

# Incremental load process
class CustomerIncrementalLoad
  def initialize(source_connection, vault_connection)
    @source = source_connection
    @vault = vault_connection
  end
  
  def execute(last_load_time)
    # Extract changed records from source
    changed_customers = @source[:customers]
      .where { updated_at > last_load_time }
      .all
    
    changed_customers.each do |customer|
      process_customer_record(customer)
    end
  end
  
  def process_customer_record(customer)
    # Generate hash key
    customer_hk = generate_hash_key(customer[:customer_id])
    
    # Insert or verify Hub exists
    ensure_hub_exists(customer_hk, customer[:customer_id])
    
    # Insert Satellite if attributes changed
    insert_satellite_if_changed(customer_hk, customer)
  end
  
  def ensure_hub_exists(hash_key, business_key)
    return if @vault[:hub_customer].where(customer_hk: hash_key).any?
    
    @vault[:hub_customer].insert(
      customer_hk: hash_key,
      customer_id: business_key,
      load_date: Time.now,
      record_source: 'CRM_SYSTEM'
    )
  end
  
  def insert_satellite_if_changed(hash_key, customer)
    # Get latest Satellite record
    latest = @vault[:sat_customer]
      .where(customer_hk: hash_key)
      .order(:load_date)
      .last
    
    # Check if attributes changed
    return if latest && attributes_match?(latest, customer)
    
    # Insert new Satellite record
    @vault[:sat_customer].insert(
      customer_hk: hash_key,
      load_date: Time.now,
      customer_name: customer[:name],
      email: customer[:email],
      phone: customer[:phone],
      record_source: 'CRM_SYSTEM'
    )
  end
  
  def attributes_match?(sat_record, source_record)
    sat_record[:customer_name] == source_record[:name] &&
    sat_record[:email] == source_record[:email] &&
    sat_record[:phone] == source_record[:phone]
  end
end

Delta detection strategies determine which Satellites need new records. Comparison-based detection retrieves the latest Satellite record and compares all attributes, inserting only when differences exist. Hash-based detection generates a hash of all attribute values and stores it in the Satellite, comparing hashes instead of individual fields. Hash-based detection performs better for Satellites with many attributes but loses visibility into which specific attributes changed.

Parallel loading distributes ETL work across multiple processes without coordination requirements. Hash key determinism enables different processes to load the same entities safely, since identical business keys always generate identical hash keys. Each process handles a subset of source data partitioned by key range, source system, or functional area, loading Hubs, Links, and Satellites independently.

Ruby Implementation

Ruby's database libraries and ETL capabilities support Data Vault implementation through gems like Sequel, ActiveRecord, or custom frameworks. The Sequel gem provides database abstraction, migration management, and query building suitable for Data Vault operations.

require 'sequel'
require 'digest'

module DataVault
  class Base
    attr_reader :db
    
    def initialize(database_url)
      @db = Sequel.connect(database_url)
    end
    
    def generate_hash_key(*values)
      normalized = values.map do |v|
        v.to_s.upcase.strip
      end.join('||')
      
      Digest::SHA256.hexdigest(normalized)
    end
    
    def current_timestamp
      Time.now.utc
    end
  end
  
  class HubLoader < Base
    def load(business_key, record_source)
      hash_key = generate_hash_key(business_key)
      
      # Check if Hub already exists
      existing = @db[:hub_customer]
        .where(customer_hk: hash_key)
        .first
      
      return hash_key if existing
      
      # Insert new Hub record
      @db[:hub_customer].insert(
        customer_hk: hash_key,
        customer_id: business_key,
        load_date: current_timestamp,
        record_source: record_source
      )
      
      hash_key
    end
  end
end

Satellite loading requires temporal comparison logic to determine when attributes changed. Ruby's hash comparison operators simplify attribute comparison, while Sequel's dataset methods handle queries efficiently.

class SatelliteLoader < DataVault::Base
  def load(parent_hash_key, attributes, record_source)
    # Get most recent Satellite record
    latest = @db[:sat_customer]
      .where(customer_hk: parent_hash_key)
      .order(Sequel.desc(:load_date))
      .first
    
    # Compare attributes
    if latest && attributes_unchanged?(latest, attributes)
      return false # No insert needed
    end
    
    # Insert new Satellite record
    @db[:sat_customer].insert(
      customer_hk: parent_hash_key,
      load_date: current_timestamp,
      **attributes,
      record_source: record_source
    )
    
    true
  end
  
  def attributes_unchanged?(existing, new_attributes)
    attribute_keys = new_attributes.keys
    
    attribute_keys.all? do |key|
      existing[key] == new_attributes[key]
    end
  end
end

Link loading connects Hubs through relationship records, generating hash keys from combined parent keys.

class LinkLoader < DataVault::Base
  def load(hub_keys, record_source)
    # Generate Link hash key from all Hub keys
    link_hash_key = generate_hash_key(*hub_keys.values)
    
    # Check if Link exists
    existing = @db[:link_customer_order]
      .where(customer_order_hk: link_hash_key)
      .first
    
    return link_hash_key if existing
    
    # Insert new Link
    @db[:link_customer_order].insert(
      customer_order_hk: link_hash_key,
      customer_hk: hub_keys[:customer],
      order_hk: hub_keys[:order],
      load_date: current_timestamp,
      record_source: record_source
    )
    
    link_hash_key
  end
end

Complete ETL orchestration coordinates Hub, Link, and Satellite loading in proper sequence.

class DataVaultETL
  def initialize(database_url)
    @hub_loader = DataVault::HubLoader.new(database_url)
    @link_loader = DataVault::LinkLoader.new(database_url)
    @sat_loader = DataVault::SatelliteLoader.new(database_url)
  end
  
  def process_customer_order(customer_data, order_data)
    record_source = 'ORDER_SYSTEM'
    
    # Load Hubs first
    customer_hk = @hub_loader.load(
      customer_data[:customer_id],
      record_source
    )
    
    order_hk = @hub_loader.load(
      order_data[:order_id],
      record_source
    )
    
    # Load Link
    link_hk = @link_loader.load(
      { customer: customer_hk, order: order_hk },
      record_source
    )
    
    # Load Satellites
    @sat_loader.load(
      customer_hk,
      {
        customer_name: customer_data[:name],
        email: customer_data[:email],
        phone: customer_data[:phone]
      },
      record_source
    )
    
    @sat_loader.load(
      order_hk,
      {
        order_date: order_data[:order_date],
        total_amount: order_data[:total_amount],
        status: order_data[:status]
      },
      record_source
    )
    
    # Link Satellite if relationship has attributes
    @sat_loader.load(
      link_hk,
      {
        relationship_type: 'PRIMARY_CUSTOMER',
        established_date: order_data[:order_date]
      },
      record_source
    )
  end
end

Point-in-time (PIT) tables optimize query performance by pre-joining Satellites and materializing temporal snapshots. Ruby scripts generate and maintain PIT structures.

class PITTableBuilder < DataVault::Base
  def build_customer_pit(snapshot_date)
    @db.transaction do
      # Clear existing PIT data for date
      @db[:pit_customer]
        .where(snapshot_date: snapshot_date)
        .delete
      
      # Get all Hubs
      hubs = @db[:hub_customer].all
      
      hubs.each do |hub|
        # Find latest Satellite as of snapshot date
        satellite = @db[:sat_customer]
          .where(customer_hk: hub[:customer_hk])
          .where { load_date <= snapshot_date }
          .order(Sequel.desc(:load_date))
          .first
        
        next unless satellite
        
        # Insert PIT record
        @db[:pit_customer].insert(
          customer_hk: hub[:customer_hk],
          snapshot_date: snapshot_date,
          sat_customer_hk: satellite[:customer_hk],
          sat_load_date: satellite[:load_date],
          customer_name: satellite[:customer_name],
          email: satellite[:email],
          phone: satellite[:phone]
        )
      end
    end
  end
end

Design Considerations

Data Vault suits organizations with complex, frequently changing data integration requirements where auditability and flexibility outweigh query simplicity. Regulatory environments requiring complete data lineage and temporal reconstruction favor Data Vault's insert-only architecture. Multiple source systems feeding the same business entities benefit from Data Vault's separation of concerns and parallel loading capabilities.

Dimensional modeling (Kimball) provides better query performance and simpler end-user access patterns for stable business requirements. Star schema designs require fewer joins and work better with business intelligence tools. Organizations with mature, unchanging business processes and minimal regulatory requirements gain limited benefit from Data Vault's complexity.

The query complexity inherent in Data Vault structures necessitates additional layers. Business Vault transformations, PIT tables, or information marts translate Data Vault structures into queryable formats. Teams must maintain both the raw data layer and presentation layers, increasing development and operational overhead.

# Query complexity comparison
# Dimensional model query (simple)
def get_customer_orders_dimensional(customer_id)
  db[:fact_orders]
    .join(:dim_customer, customer_key: :customer_key)
    .where(customer_id: customer_id)
    .all
end

# Data Vault query (complex)
def get_customer_orders_vault(customer_id)
  db[:hub_customer]
    .join(:link_customer_order, customer_hk: :customer_hk)
    .join(:hub_order, customer_hk: :customer_hk)
    .join(:sat_customer, customer_hk: :customer_hk)
    .join(:sat_order, order_hk: :order_hk)
    .where(Sequel[:hub_customer][:customer_id] => customer_id)
    .where do
      Sequel[:sat_customer][:load_date] == 
        db[:sat_customer]
          .where(customer_hk: Sequel[:hub_customer][:customer_hk])
          .select { max(load_date) }
    end
    .where do
      Sequel[:sat_order][:load_date] == 
        db[:sat_order]
          .where(order_hk: Sequel[:hub_order][:order_hk])
          .select { max(load_date) }
    end
    .all
end

Hybrid approaches combine Data Vault for the core data warehouse layer with dimensional marts for reporting. The Data Vault layer handles integration, history, and audit requirements while dimensional models provide query performance. This pattern increases architectural complexity but leverages the strengths of both methodologies.

Team skill requirements differ substantially between Data Vault and dimensional modeling. Data Vault demands understanding of temporal queries, hash key generation, and multi-table join patterns. Dimensional modeling aligns better with SQL skills most developers already possess. Training overhead and ongoing maintenance complexity influence methodology selection.

Source system volatility favors Data Vault adoption. Organizations experiencing frequent source schema changes, new system integrations, or business rule modifications benefit from Data Vault's flexibility. Stable source systems with minimal change derive less value from the insert-only architecture.

Performance Considerations

Hash keys eliminate surrogate key sequence generation bottlenecks, enabling parallel loading without coordination. Multiple ETL processes generate identical hash keys for the same business keys, loading Hubs simultaneously without conflicts. This determinism enables horizontal scaling of ETL workloads across compute resources.

Join performance depends on proper indexing of hash key columns. Hash keys typically use VARCHAR(32) for MD5 or VARCHAR(64) for SHA-256, requiring more storage than integer keys but providing consistent performance. Database systems handle hash-based joins efficiently when indexes exist on all hash key columns in Hubs, Links, and Satellites.

# Migration example with proper indexes
Sequel.migration do
  change do
    create_table(:hub_customer) do
      String :customer_hk, size: 64, null: false
      String :customer_id, null: false
      DateTime :load_date, null: false
      String :record_source, size: 100, null: false
      
      primary_key [:customer_hk]
      index :customer_id
      index :load_date
    end
    
    create_table(:sat_customer) do
      String :customer_hk, size: 64, null: false
      DateTime :load_date, null: false
      String :customer_name, size: 200
      String :email, size: 200
      String :phone, size: 50
      String :record_source, size: 100, null: false
      
      primary_key [:customer_hk, :load_date]
      index :load_date
    end
  end
end

Satellite queries require temporal filtering to retrieve current state, adding complexity and potential performance issues. Point-in-time tables materialize temporal joins for common query patterns, trading storage space for query speed. PIT table generation runs periodically, snapshotting the current state of related Satellites.

Partition strategies improve query performance and maintenance operations. Partitioning Satellites by load_date enables efficient historical purging and focuses queries on recent data. Range partitioning on date columns aligns with temporal query patterns common in Data Vault access.

# PostgreSQL partition creation
def create_partitioned_satellite
  db.run <<-SQL
    CREATE TABLE sat_customer (
      customer_hk VARCHAR(64) NOT NULL,
      load_date TIMESTAMP NOT NULL,
      customer_name VARCHAR(200),
      email VARCHAR(200),
      phone VARCHAR(50),
      record_source VARCHAR(100) NOT NULL,
      PRIMARY KEY (customer_hk, load_date)
    ) PARTITION BY RANGE (load_date);
    
    CREATE TABLE sat_customer_2024 PARTITION OF sat_customer
      FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
      
    CREATE TABLE sat_customer_2025 PARTITION OF sat_customer
      FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
  SQL
end

Materialized views aggregate frequently accessed Link and Satellite combinations, reducing join complexity at query time. The database system maintains materialized views through refresh schedules, balancing query performance against refresh overhead.

Bulk loading operations outperform row-by-row inserts significantly. Ruby's database libraries support bulk inserts through batch operations, reducing round trips and transaction overhead.

def bulk_load_satellites(records, batch_size: 1000)
  records.each_slice(batch_size) do |batch|
    satellite_records = batch.map do |record|
      {
        customer_hk: generate_hash_key(record[:customer_id]),
        load_date: current_timestamp,
        customer_name: record[:name],
        email: record[:email],
        phone: record[:phone],
        record_source: record[:source]
      }
    end
    
    db[:sat_customer].multi_insert(satellite_records)
  end
end

Query optimization focuses on reducing the number of Satellite table scans. Queries retrieving multiple Satellites for the same Hub benefit from PIT tables or views that pre-join related Satellites. Denormalization in the Business Vault layer addresses common query patterns without compromising Raw Vault structure.

Tools & Ecosystem

Data Vault automation tools generate DDL scripts, loading procedures, and documentation from metadata definitions. DbSchema and erwin Data Modeler support Data Vault notation and generate appropriate table structures. Automation reduces manual coding effort and enforces naming conventions consistently.

Ruby gems supporting data warehousing operations include Sequel for database access, Kiba for ETL pipelines, and ActiveRecord for ORM patterns. Sequel's dataset methods and migration system work well with Data Vault structures.

# Kiba ETL pipeline for Data Vault
require 'kiba'
require 'sequel'

class DataVaultETLPipeline
  def self.build(source_db, vault_db)
    Kiba.parse do
      source Kiba::Common::Sources::SQL,
        db: source_db,
        query: "SELECT * FROM customers WHERE updated_at > ?"
      
      transform do |row|
        row[:customer_hk] = Digest::SHA256.hexdigest(
          row[:customer_id].to_s.upcase
        )
        row[:load_date] = Time.now.utc
        row[:record_source] = 'CRM_SYSTEM'
        row
      end
      
      destination DataVaultDestination,
        db: vault_db,
        hub_table: :hub_customer,
        sat_table: :sat_customer
    end
  end
end

class DataVaultDestination
  def initialize(db:, hub_table:, sat_table:)
    @db = db
    @hub_table = hub_table
    @sat_table = sat_table
  end
  
  def write(row)
    # Write Hub if not exists
    unless @db[@hub_table].where(customer_hk: row[:customer_hk]).any?
      @db[@hub_table].insert(
        customer_hk: row[:customer_hk],
        customer_id: row[:customer_id],
        load_date: row[:load_date],
        record_source: row[:record_source]
      )
    end
    
    # Write Satellite
    @db[@sat_table].insert(
      customer_hk: row[:customer_hk],
      load_date: row[:load_date],
      customer_name: row[:name],
      email: row[:email],
      phone: row[:phone],
      record_source: row[:record_source]
    )
  end
end

Database platforms support Data Vault implementations with varying capabilities. PostgreSQL provides robust partitioning, parallel query execution, and materialized views. Snowflake's cloud architecture handles Data Vault's join-heavy queries efficiently through automatic clustering and micro-partitions. Amazon Redshift optimizes columnar storage for analytical workloads typical in Data Vault queries.

Change data capture tools like Debezium stream source system changes to Data Vault loading processes, reducing latency between source updates and warehouse availability. CDC eliminates full table scans for incremental loading, improving efficiency for large tables.

Version control systems track Data Vault metadata definitions, migration scripts, and loading procedures. Git repositories store table definitions, enabling team collaboration and deployment automation. Database migration tools like Sequel's migration system or Flyway manage schema version progression across environments.

Reference

Entity Type Characteristics

Entity Type Purpose Contains Primary Key Updates Allowed
Hub Store business keys Hash key, business key, load date, record source Hash key No
Link Connect Hubs Hash key, parent hash keys, load date, record source Hash key No
Satellite Store attributes Parent hash key, load date, attributes, record source Parent hash key + load date No

Naming Conventions

Element Pattern Example
Hub table HUB_EntityName HUB_CUSTOMER
Link table LNK_Entity1_Entity2 LNK_CUSTOMER_ORDER
Satellite table SAT_ParentEntity_AttributeGroup SAT_CUSTOMER_PROFILE
Hash key column entity_hk customer_hk
Business key column entity_id customer_id
Load date column load_date load_date
Record source column record_source record_source

Hash Key Generation

Algorithm Output Length Performance Collision Risk
MD5 32 characters Fast Low for typical datasets
SHA-1 40 characters Medium Very low
SHA-256 64 characters Slower Extremely low

Metadata Columns

Column Name Data Type Purpose Required
load_date TIMESTAMP When record loaded to warehouse Yes
record_source VARCHAR Source system identifier Yes
applied_date TIMESTAMP When event occurred in source Optional
hash_diff VARCHAR Hash of all attributes for change detection Optional

Loading Sequence

Step Entity Type Dependencies Parallel Safe
1 Hubs None Yes
2 Links Hub existence Yes with hash keys
3 Satellites Hub or Link existence Yes with hash keys
4 PIT Tables All Satellites loaded No

Common Link Types

Link Type Connects Example
Transaction Link Entity to transaction event Customer to Order
Hierarchy Link Parent to child entity Employee to Manager
Association Link Related entities Product to Category
Many-to-many Link Two entities with multiple relationships Student to Course

Query Patterns

Pattern Use Case Complexity
Current state Get latest attribute values Medium
Historical snapshot Get state at specific date High
Change history Track attribute changes over time Medium
Multi-entity join Combine data across Links High

Design Rules

Rule Description Impact of Violation
Hubs contain only business keys No descriptive attributes in Hubs Difficult to track key history
Satellites never reference other Satellites Keep temporal context independent Breaks temporal query logic
Hash keys must be deterministic Same input always generates same hash Loading conflicts, duplicates
Load date part of Satellite primary key Enables multiple records per parent Cannot track history
No updates or deletes Insert-only architecture Breaks audit trail
Links contain only hash keys No attributes in Links Use Link Satellites for attributes