CrackedRuby - B-Trees and B+ Trees

Overview

B-Trees and B+ Trees are self-balancing search tree data structures that maintain sorted data and allow searches, sequential access, insertions, and deletions in logarithmic time. Unlike binary search trees where each node has at most two children, B-Trees allow nodes to have multiple children, making them particularly effective for systems that read and write large blocks of data.

Database management systems and file systems use these structures because they minimize disk I/O operations. When reading from disk, the system retrieves entire blocks of data at once. B-Trees organize data so that each node corresponds to a disk block, maximizing the amount of useful information retrieved per disk access.

A B-Tree of order m (also called the degree or branching factor) maintains these properties:

Every node has at most m children
Every non-leaf node (except root) has at least ⌈m/2⌉ children
The root has at least two children if it is not a leaf
All leaves appear at the same level
A non-leaf node with k children contains k-1 keys

B+ Trees extend B-Trees by storing all values in leaf nodes and using internal nodes purely for navigation. This modification provides several advantages for database applications, including more predictable sequential access performance and higher fanout in internal nodes.

Key Principles

B-Trees balance height against node capacity. Each node stores multiple keys and child pointers, reducing tree height compared to binary trees. For a tree with n keys and maximum m children per node, the height is O(log_m n). Larger m values produce shorter trees, but nodes become more expensive to search linearly.

The fundamental difference between B-Trees and B+ Trees lies in data placement. B-Trees store keys and associated values in both internal and leaf nodes. When searching for a key, the search may terminate at any level. B+ Trees store all data in leaf nodes, with internal nodes containing only keys and pointers for navigation. This separation creates a dense index structure where all values reside at the same level.

B+ Trees link leaf nodes together, forming a sorted linked list of all values. This structure enables efficient range queries. Starting from any leaf node, the system can traverse subsequent nodes without returning to the parent. Database systems exploit this property for range scans and sequential reads.

Node splitting maintains balance during insertion. When a node exceeds capacity, it splits into two nodes, and the median key moves up to the parent. This process may cascade up the tree, potentially increasing tree height by one level. The split operation preserves the ordering invariant across all nodes.

Node merging or redistribution handles deletion. When a node falls below minimum capacity, it either merges with a sibling node or borrows keys from a sibling through the parent. These operations ensure the tree remains balanced while maintaining the minimum occupancy requirement for all nodes except the root.

The branching factor directly affects performance characteristics. Higher branching factors reduce tree height, decreasing the number of node accesses during search operations. Database systems typically use branching factors of 100-200 for disk-based B-Trees, matching node size to disk block size. In-memory implementations may use smaller branching factors since memory access patterns differ from disk I/O.

Ruby Implementation

Ruby applications typically use B-Trees indirectly through database systems rather than implementing them directly. However, understanding how to implement these structures reveals their operational characteristics and informs database design decisions.

A basic B-Tree node in Ruby contains keys and children:

class BTreeNode
  attr_accessor :keys, :children, :leaf
  
  def initialize(leaf: true)
    @keys = []
    @children = []
    @leaf = leaf
  end
  
  def full?(order)
    keys.size >= order - 1
  end
  
  def minimal?(order)
    keys.size < (order / 2.0).ceil - 1
  end
end

The B-Tree class manages the root and coordinates operations:

class BTree
  attr_accessor :root, :order
  
  def initialize(order)
    @order = order
    @root = BTreeNode.new
  end
  
  def search(key, node = @root)
    i = 0
    while i < node.keys.size && key > node.keys[i]
      i += 1
    end
    
    return node if i < node.keys.size && key == node.keys[i]
    return nil if node.leaf
    
    search(key, node.children[i])
  end
end

Insertion requires checking for full nodes and splitting when necessary:

class BTree
  def insert(key)
    if @root.full?(@order)
      old_root = @root
      @root = BTreeNode.new(leaf: false)
      @root.children << old_root
      split_child(@root, 0)
    end
    insert_non_full(@root, key)
  end
  
  private
  
  def split_child(parent, index)
    full_child = parent.children[index]
    new_child = BTreeNode.new(leaf: full_child.leaf)
    
    mid = (@order - 1) / 2
    new_child.keys = full_child.keys[mid + 1..-1]
    full_child.keys = full_child.keys[0..mid]
    
    unless full_child.leaf
      new_child.children = full_child.children[mid + 1..-1]
      full_child.children = full_child.children[0..mid + 1]
    end
    
    parent.keys.insert(index, full_child.keys.pop)
    parent.children.insert(index + 1, new_child)
  end
  
  def insert_non_full(node, key)
    i = node.keys.size - 1
    
    if node.leaf
      node.keys << nil
      while i >= 0 && key < node.keys[i]
        node.keys[i + 1] = node.keys[i]
        i -= 1
      end
      node.keys[i + 1] = key
    else
      while i >= 0 && key < node.keys[i]
        i -= 1
      end
      i += 1
      
      if node.children[i].full?(@order)
        split_child(node, i)
        i += 1 if key > node.keys[i]
      end
      
      insert_non_full(node.children[i], key)
    end
  end
end

B+ Tree implementation requires tracking leaf linkage:

class BPlusTreeNode < BTreeNode
  attr_accessor :next_leaf
  
  def initialize(leaf: true)
    super(leaf: leaf)
    @next_leaf = nil
  end
end

class BPlusTree < BTree
  def initialize(order)
    super(order)
    @root = BPlusTreeNode.new
  end
  
  def range_query(start_key, end_key)
    results = []
    node = find_leaf(start_key, @root)
    
    while node
      node.keys.each do |key|
        break if key > end_key
        results << key if key >= start_key
      end
      break if node.keys.last >= end_key
      node = node.next_leaf
    end
    
    results
  end
  
  private
  
  def find_leaf(key, node)
    return node if node.leaf
    
    i = 0
    while i < node.keys.size && key >= node.keys[i]
      i += 1
    end
    
    find_leaf(key, node.children[i])
  end
end

When working with persistent storage, serialize nodes to match disk block sizes:

class PersistentBTree
  BLOCK_SIZE = 4096
  
  def initialize(order, file_path)
    @order = order
    @file = File.open(file_path, 'rb+')
    @cache = {}
  end
  
  def read_node(block_id)
    return @cache[block_id] if @cache.key?(block_id)
    
    @file.seek(block_id * BLOCK_SIZE)
    data = @file.read(BLOCK_SIZE)
    node = deserialize_node(data)
    @cache[block_id] = node
    node
  end
  
  def write_node(block_id, node)
    @cache[block_id] = node
    @file.seek(block_id * BLOCK_SIZE)
    @file.write(serialize_node(node).ljust(BLOCK_SIZE, "\0"))
    @file.flush
  end
  
  private
  
  def serialize_node(node)
    Marshal.dump({
      keys: node.keys,
      children_ids: node.children.map(&:object_id),
      leaf: node.leaf
    })
  end
  
  def deserialize_node(data)
    data = Marshal.load(data)
    node = BTreeNode.new(leaf: data[:leaf])
    node.keys = data[:keys]
    node
  end
end

Performance Considerations

Search operations in B-Trees complete in O(log_m n) time where m represents the branching factor and n represents the number of keys. Larger branching factors reduce the number of levels but increase the time spent searching within each node. The optimal branching factor balances these competing factors based on the relative costs of node access versus intra-node search.

For disk-based systems, node access dominates performance because disk I/O is orders of magnitude slower than memory operations. A typical hard drive takes 5-10 milliseconds for a random seek, while memory access takes nanoseconds. B-Trees optimize for this by maximizing the branching factor, often setting node size equal to disk block size (typically 4KB-16KB). With 4KB nodes and 8-byte keys, a node can store approximately 500 keys, reducing tree height dramatically.

B+ Trees outperform B-Trees for range queries. Since all values reside in linked leaf nodes, range queries require locating the first key and then scanning subsequent leaves. B-Trees scatter values throughout the tree, requiring more random access for range scans. Database systems exploit this property by using B+ Trees for indexes where range queries are common.

Cache behavior affects in-memory B-Tree performance significantly. Modern CPUs have multiple cache levels (L1, L2, L3) with decreasing speed and increasing size. Smaller nodes fit entirely in cache, reducing cache misses. The optimal branching factor for in-memory B-Trees is typically 32-64, smaller than disk-based variants.

Insertion performance depends on tree fullness and split frequency. When nodes have spare capacity, insertions complete in O(log_m n) time. When nodes are full, splits propagate up the tree, potentially requiring multiple node modifications. Amortized insertion cost remains O(log_m n), but worst-case scenarios involve updating nodes at every level from leaf to root.

class BTreeBenchmark
  def self.compare_orders
    orders = [4, 8, 16, 32]
    key_count = 10_000
    
    orders.each do |order|
      tree = BTree.new(order)
      
      insert_time = Benchmark.measure do
        key_count.times { |i| tree.insert(rand(100_000)) }
      end
      
      search_time = Benchmark.measure do
        1000.times { tree.search(rand(100_000)) }
      end
      
      puts "Order #{order}: Insert #{insert_time.real}s, Search #{search_time.real}s"
    end
  end
end

Deletion performance exhibits similar characteristics to insertion. Simple deletions from leaf nodes complete quickly. Deletions requiring merges or redistributions involve multiple node modifications. The amortized cost remains O(log_m n), but individual operations may require more work.

Memory usage scales with the branching factor and number of keys. Each node stores up to m-1 keys and m pointers. With n keys in the tree, approximately n/((m-1)/2) nodes exist in the worst case (minimum occupancy). Higher branching factors reduce the number of nodes but increase per-node overhead for partially filled nodes.

B+ Trees use memory more efficiently for indexing. Internal nodes contain only keys and pointers, allowing higher fanout for the same node size. Leaf nodes store all actual data, which may include large values. This separation allows internal nodes to remain in cache while leaf nodes may reside on disk.

Bulk loading performs better than sequential insertion. When building a B-Tree from sorted data, construct the tree bottom-up rather than inserting keys one at a time. This approach guarantees nodes are approximately 100% full rather than the typical 50% occupancy from random insertions:

def bulk_load_btree(sorted_keys, order)
  leaf_nodes = []
  
  sorted_keys.each_slice(order - 1) do |keys|
    node = BTreeNode.new(leaf: true)
    node.keys = keys
    leaf_nodes << node
  end
  
  build_internal_levels(leaf_nodes, order)
end

def build_internal_levels(children, order)
  return children.first if children.size == 1
  
  parent_nodes = []
  children.each_slice(order) do |child_group|
    parent = BTreeNode.new(leaf: false)
    parent.children = child_group
    parent.keys = child_group[0..-2].map { |c| c.keys.last }
    parent_nodes << parent
  end
  
  build_internal_levels(parent_nodes, order)
end

Design Considerations

B-Trees work best when the dataset exceeds available memory and random access patterns dominate. File systems use B-Trees to index directory structures because directory lookups are random and directories contain many entries. The branching factor is tuned to match disk block size, minimizing the number of disk reads per lookup.

B+ Trees are preferred for database indexes because range queries and sequential scans are common. The linked leaf structure allows efficient iteration without traversing the tree structure. Most relational databases use B+ Trees for both primary and secondary indexes.

The choice between B-Trees and B+ Trees depends on access patterns. If the application primarily performs point lookups, B-Trees may perform slightly better because values in internal nodes can be retrieved without descending to leaves. If range queries are common, B+ Trees significantly outperform B-Trees.

Hash tables provide O(1) average-case lookup but do not support efficient range queries or ordered iteration. B-Trees sacrifice some lookup speed for ordering guarantees and range query support. Use hash tables when key-value lookups dominate and ordering does not matter. Use B-Trees when ordering, range queries, or predecessor/successor operations are required.

Skip lists provide similar functionality to B-Trees with simpler implementation. Probabilistic balancing in skip lists avoids the complexity of node splitting and merging. However, B-Trees offer better cache locality and more predictable performance, making them preferable for systems where consistency matters.

Red-black trees and AVL trees maintain better balance than B-Trees with binary nodes. For in-memory data structures with relatively small datasets, these binary trees may outperform B-Trees due to simpler node structure and lower overhead. B-Trees become advantageous when node access cost is high or when the dataset exceeds cache size.

Order selection requires balancing multiple factors. Larger orders reduce tree height but increase intra-node search time and memory waste from partially filled nodes. For disk-based systems, match node size to disk block size. For in-memory systems, match node size to cache line size:

def optimal_order_for_cache(key_size, cache_line_size = 64)
  pointer_size = 8
  keys_per_line = cache_line_size / (key_size + pointer_size)
  [keys_per_line + 1, 4].max  # Minimum order of 4
end

Write-heavy workloads may benefit from alternative structures. B-Trees perform well for read-mostly workloads but suffer under heavy write loads due to split and merge operations. Log-structured merge trees (LSM trees) optimize for writes by buffering changes in memory and periodically merging them to disk. Choose B-Trees when reads outnumber writes or when real-time query performance is critical.

Practical Examples

A simple in-memory B-Tree for caching sorted data demonstrates basic operations:

class SortedCache
  def initialize(order = 32)
    @tree = BTree.new(order)
    @size = 0
  end
  
  def add(key)
    return false if @tree.search(key)
    @tree.insert(key)
    @size += 1
    true
  end
  
  def contains?(key)
    !@tree.search(key).nil?
  end
  
  def range(start_key, end_key)
    results = []
    collect_range(@tree.root, start_key, end_key, results)
    results.sort
  end
  
  private
  
  def collect_range(node, start_key, end_key, results)
    return if node.nil?
    
    i = 0
    while i < node.keys.size
      if node.keys[i] >= start_key && node.keys[i] <= end_key
        results << node.keys[i]
      end
      
      unless node.leaf
        collect_range(node.children[i], start_key, end_key, results) if node.keys[i] >= start_key
      end
      
      i += 1
    end
    
    collect_range(node.children[i], start_key, end_key, results) unless node.leaf
  end
end

# Usage
cache = SortedCache.new
1000.times { cache.add(rand(10_000)) }
puts cache.range(100, 200)

A persistent B+ Tree for indexing records in a file demonstrates disk-based operations:

class FileIndex
  Record = Struct.new(:id, :offset, :size)
  
  def initialize(index_file, order = 128)
    @tree = BPlusTree.new(order)
    @index_file = index_file
    load_index
  end
  
  def add_record(id, offset, size)
    @tree.insert(id)
    save_record_metadata(id, offset, size)
  end
  
  def find_record(id)
    node = @tree.search(id)
    return nil unless node
    load_record_metadata(id)
  end
  
  def find_records_in_range(start_id, end_id)
    ids = @tree.range_query(start_id, end_id)
    ids.map { |id| load_record_metadata(id) }
  end
  
  private
  
  def save_record_metadata(id, offset, size)
    File.open(@index_file, 'ab') do |f|
      f.write([id, offset, size].pack('Q*'))
    end
  end
  
  def load_record_metadata(id)
    File.open(@index_file, 'rb') do |f|
      while (data = f.read(24))
        record_id, offset, size = data.unpack('Q*')
        return Record.new(record_id, offset, size) if record_id == id
      end
    end
    nil
  end
  
  def load_index
    return unless File.exist?(@index_file)
    File.open(@index_file, 'rb') do |f|
      while (data = f.read(24))
        record_id, _, _ = data.unpack('Q*')
        @tree.insert(record_id)
      end
    end
  end
end

A concurrent B-Tree using locking demonstrates thread-safe operations:

class ConcurrentBTree
  def initialize(order)
    @tree = BTree.new(order)
    @lock = Mutex.new
    @node_locks = Hash.new { |h, k| h[k] = Mutex.new }
  end
  
  def insert(key)
    @lock.synchronize do
      path = find_insertion_path(key)
      lock_path(path)
      
      begin
        @tree.insert(key)
      ensure
        unlock_path(path)
      end
    end
  end
  
  def search(key)
    node = @tree.root
    path = []
    
    until node.nil?
      path << node
      @node_locks[node.object_id].lock
      
      i = node.keys.bsearch_index { |k| k >= key }
      if i && node.keys[i] == key
        unlock_path(path)
        return node
      end
      
      break if node.leaf
      node = node.children[i || node.children.size - 1]
    end
    
    unlock_path(path)
    nil
  end
  
  private
  
  def find_insertion_path(key)
    path = []
    node = @tree.root
    
    until node.leaf
      path << node
      i = node.keys.bsearch_index { |k| k > key } || node.keys.size
      node = node.children[i]
    end
    
    path << node
    path
  end
  
  def lock_path(path)
    path.each { |node| @node_locks[node.object_id].lock }
  end
  
  def unlock_path(path)
    path.reverse_each { |node| @node_locks[node.object_id].unlock }
  end
end

Common Pitfalls

Node size mismatches with block size waste I/O bandwidth. When a node is smaller than the disk block size, the system reads unused data from disk. When a node is larger than the block size, a single node access requires multiple disk reads. Calculate node size based on key size, pointer size, and target block size:

def calculate_node_capacity(key_size, pointer_size, block_size)
  header_size = 16  # Metadata overhead
  available = block_size - header_size
  entry_size = key_size + pointer_size
  available / entry_size
end

# For 4KB blocks with 8-byte keys and pointers
capacity = calculate_node_capacity(8, 8, 4096)
# => 255 entries per node

Forgetting to link leaf nodes in B+ Tree implementations breaks range queries. Without proper linkage, range queries must traverse the tree structure to find each consecutive key, eliminating the primary advantage of B+ Trees:

# Wrong: Not linking leaves
def split_leaf_wrong(node)
  new_leaf = BPlusTreeNode.new(leaf: true)
  mid = node.keys.size / 2
  new_leaf.keys = node.keys[mid..-1]
  node.keys = node.keys[0...mid]
  # Missing: new_leaf.next_leaf = node.next_leaf
  # Missing: node.next_leaf = new_leaf
end

# Correct: Maintaining leaf chain
def split_leaf_correct(node)
  new_leaf = BPlusTreeNode.new(leaf: true)
  mid = node.keys.size / 2
  new_leaf.keys = node.keys[mid..-1]
  node.keys = node.keys[0...mid]
  new_leaf.next_leaf = node.next_leaf
  node.next_leaf = new_leaf
end

Off-by-one errors in split and merge operations corrupt tree structure. The median key handling differs between B-Trees and B+ Trees. In B-Trees, the median moves to the parent. In B+ Trees, the median copies to the parent while remaining in the right child. Mixing these approaches causes duplicate or missing keys.

Insufficient minimum occupancy checks cause cascading underflow. When a deletion makes a node fall below minimum occupancy, the code must either merge with a sibling or redistribute keys. Failing to check recursively after merges can leave parent nodes below minimum occupancy:

def delete(node, key)
  # Delete key from node
  remove_key(node, key)
  
  # Check occupancy
  if node.minimal?(@order) && node != @root
    fix_underflow(node)
  end
  
  # Must check parent occupancy after fix
  if node.parent && node.parent.minimal?(@order)
    fix_underflow(node.parent)
  end
end

Using equality comparison instead of binary search within nodes reduces performance. With large branching factors, linear search within a node becomes a bottleneck. Use binary search for intra-node key lookup:

# Slow: Linear search
def find_key_linear(keys, target)
  keys.each_with_index do |key, i|
    return i if key >= target
  end
  keys.size
end

# Fast: Binary search
def find_key_binary(keys, target)
  keys.bsearch_index { |key| key >= target } || keys.size
end

Assuming uniform distribution leads to poor order selection. Real-world data often exhibits skew. Some ranges may have dense key concentrations while others remain sparse. Static node sizes handle skewed distributions poorly. Consider adaptive structures or partitioning strategies for highly skewed datasets.

Ignoring concurrency causes race conditions in multi-threaded environments. B-Tree modifications are not atomic. Reading while another thread splits a node can return inconsistent results. Implement proper locking strategies, such as lock coupling (crabbing) to allow concurrent reads while protecting against concurrent modifications.

Reference

B-Tree vs B+ Tree Comparison

Characteristic	B-Tree	B+ Tree
Data Location	Internal and leaf nodes	Leaf nodes only
Internal Nodes	Store keys and values	Store keys only
Leaf Links	None	Linked list
Range Query	Requires tree traversal	Sequential leaf scan
Node Utilization	50% minimum for data	Higher fanout in internal nodes
Cache Efficiency	Lower for large values	Better for index scanning
Space Overhead	Lower for small datasets	Higher due to key duplication
Common Use Case	File systems	Database indexes

Time Complexity

Operation	Average Case	Worst Case	Notes
Search	O(log n)	O(log n)	Base depends on branching factor
Insert	O(log n)	O(log n)	May cause splits
Delete	O(log n)	O(log n)	May cause merges
Range Query (B-Tree)	O(log n + k)	O(log n + k)	k is result size, scattered nodes
Range Query (B+ Tree)	O(log n + k)	O(log n + k)	k is result size, sequential access
Minimum/Maximum	O(log n)	O(log n)	Follow leftmost/rightmost path
Predecessor/Successor	O(log n)	O(log n)	B+ Tree: O(1) if in same leaf

Space Complexity

Component	Space	Notes
Node Overhead	O(m) per node	m is branching factor
Total Space	O(n)	n is number of keys
Minimum Occupancy	50%	Except root node
Pointers per Node	m	One more than keys
Leaf Chain (B+ Tree)	O(l)	l is number of leaves

Order Selection Guidelines

System Type	Typical Order	Reason
Disk-based	100-500	Match disk block size (4KB-16KB)
In-memory	32-64	Match CPU cache line size
SSD-based	50-200	Balance between block size and access time
Embedded	8-16	Minimize memory footprint

Node Operations

Operation	Steps	Complexity
Split Node	Create new node, distribute keys, update parent	O(m)
Merge Nodes	Combine keys, remove from parent, update links	O(m)
Redistribute	Move keys through parent	O(m)
Search Node	Binary search keys	O(log m)

Implementation Checklist

Requirement	B-Tree	B+ Tree	Description
Key ordering	Required	Required	Maintains sorted order
Minimum children	ceil(m/2)	ceil(m/2)	Except root
Maximum children	m	m	Order of tree
Leaf linking	Not needed	Required	For range queries
Value storage	All nodes	Leaves only	Internal nodes navigate
Root handling	Special case	Special case	May have fewer children
Duplicate keys	Disallowed	Disallowed	Use multimap for duplicates

Common Configuration Parameters

Parameter	Typical Value	Impact
Order	4-500	Higher order reduces height
Node Size	4096 bytes	Match storage block size
Cache Size	10-20% of index	Reduces I/O operations
Fill Factor	70-90%	Space vs rebalancing frequency
Key Size	8-256 bytes	Affects branching factor
Pointer Size	8 bytes	64-bit addressing

B-Trees and B+ Trees