Overview
B-Trees and B+ Trees are self-balancing search tree data structures that maintain sorted data and allow searches, sequential access, insertions, and deletions in logarithmic time. Unlike binary search trees where each node has at most two children, B-Trees allow nodes to have multiple children, making them particularly effective for systems that read and write large blocks of data.
Database management systems and file systems use these structures because they minimize disk I/O operations. When reading from disk, the system retrieves entire blocks of data at once. B-Trees organize data so that each node corresponds to a disk block, maximizing the amount of useful information retrieved per disk access.
A B-Tree of order m (also called the degree or branching factor) maintains these properties:
- Every node has at most m children
- Every non-leaf node (except root) has at least ⌈m/2⌉ children
- The root has at least two children if it is not a leaf
- All leaves appear at the same level
- A non-leaf node with k children contains k-1 keys
B+ Trees extend B-Trees by storing all values in leaf nodes and using internal nodes purely for navigation. This modification provides several advantages for database applications, including more predictable sequential access performance and higher fanout in internal nodes.
Key Principles
B-Trees balance height against node capacity. Each node stores multiple keys and child pointers, reducing tree height compared to binary trees. For a tree with n keys and maximum m children per node, the height is O(log_m n). Larger m values produce shorter trees, but nodes become more expensive to search linearly.
The fundamental difference between B-Trees and B+ Trees lies in data placement. B-Trees store keys and associated values in both internal and leaf nodes. When searching for a key, the search may terminate at any level. B+ Trees store all data in leaf nodes, with internal nodes containing only keys and pointers for navigation. This separation creates a dense index structure where all values reside at the same level.
B+ Trees link leaf nodes together, forming a sorted linked list of all values. This structure enables efficient range queries. Starting from any leaf node, the system can traverse subsequent nodes without returning to the parent. Database systems exploit this property for range scans and sequential reads.
Node splitting maintains balance during insertion. When a node exceeds capacity, it splits into two nodes, and the median key moves up to the parent. This process may cascade up the tree, potentially increasing tree height by one level. The split operation preserves the ordering invariant across all nodes.
Node merging or redistribution handles deletion. When a node falls below minimum capacity, it either merges with a sibling node or borrows keys from a sibling through the parent. These operations ensure the tree remains balanced while maintaining the minimum occupancy requirement for all nodes except the root.
The branching factor directly affects performance characteristics. Higher branching factors reduce tree height, decreasing the number of node accesses during search operations. Database systems typically use branching factors of 100-200 for disk-based B-Trees, matching node size to disk block size. In-memory implementations may use smaller branching factors since memory access patterns differ from disk I/O.
Ruby Implementation
Ruby applications typically use B-Trees indirectly through database systems rather than implementing them directly. However, understanding how to implement these structures reveals their operational characteristics and informs database design decisions.
A basic B-Tree node in Ruby contains keys and children:
class BTreeNode
attr_accessor :keys, :children, :leaf
def initialize(leaf: true)
@keys = []
@children = []
@leaf = leaf
end
def full?(order)
keys.size >= order - 1
end
def minimal?(order)
keys.size < (order / 2.0).ceil - 1
end
end
The B-Tree class manages the root and coordinates operations:
class BTree
attr_accessor :root, :order
def initialize(order)
@order = order
@root = BTreeNode.new
end
def search(key, node = @root)
i = 0
while i < node.keys.size && key > node.keys[i]
i += 1
end
return node if i < node.keys.size && key == node.keys[i]
return nil if node.leaf
search(key, node.children[i])
end
end
Insertion requires checking for full nodes and splitting when necessary:
class BTree
def insert(key)
if @root.full?(@order)
old_root = @root
@root = BTreeNode.new(leaf: false)
@root.children << old_root
split_child(@root, 0)
end
insert_non_full(@root, key)
end
private
def split_child(parent, index)
full_child = parent.children[index]
new_child = BTreeNode.new(leaf: full_child.leaf)
mid = (@order - 1) / 2
new_child.keys = full_child.keys[mid + 1..-1]
full_child.keys = full_child.keys[0..mid]
unless full_child.leaf
new_child.children = full_child.children[mid + 1..-1]
full_child.children = full_child.children[0..mid + 1]
end
parent.keys.insert(index, full_child.keys.pop)
parent.children.insert(index + 1, new_child)
end
def insert_non_full(node, key)
i = node.keys.size - 1
if node.leaf
node.keys << nil
while i >= 0 && key < node.keys[i]
node.keys[i + 1] = node.keys[i]
i -= 1
end
node.keys[i + 1] = key
else
while i >= 0 && key < node.keys[i]
i -= 1
end
i += 1
if node.children[i].full?(@order)
split_child(node, i)
i += 1 if key > node.keys[i]
end
insert_non_full(node.children[i], key)
end
end
end
B+ Tree implementation requires tracking leaf linkage:
class BPlusTreeNode < BTreeNode
attr_accessor :next_leaf
def initialize(leaf: true)
super(leaf: leaf)
@next_leaf = nil
end
end
class BPlusTree < BTree
def initialize(order)
super(order)
@root = BPlusTreeNode.new
end
def range_query(start_key, end_key)
results = []
node = find_leaf(start_key, @root)
while node
node.keys.each do |key|
break if key > end_key
results << key if key >= start_key
end
break if node.keys.last >= end_key
node = node.next_leaf
end
results
end
private
def find_leaf(key, node)
return node if node.leaf
i = 0
while i < node.keys.size && key >= node.keys[i]
i += 1
end
find_leaf(key, node.children[i])
end
end
When working with persistent storage, serialize nodes to match disk block sizes:
class PersistentBTree
BLOCK_SIZE = 4096
def initialize(order, file_path)
@order = order
@file = File.open(file_path, 'rb+')
@cache = {}
end
def read_node(block_id)
return @cache[block_id] if @cache.key?(block_id)
@file.seek(block_id * BLOCK_SIZE)
data = @file.read(BLOCK_SIZE)
node = deserialize_node(data)
@cache[block_id] = node
node
end
def write_node(block_id, node)
@cache[block_id] = node
@file.seek(block_id * BLOCK_SIZE)
@file.write(serialize_node(node).ljust(BLOCK_SIZE, "\0"))
@file.flush
end
private
def serialize_node(node)
Marshal.dump({
keys: node.keys,
children_ids: node.children.map(&:object_id),
leaf: node.leaf
})
end
def deserialize_node(data)
data = Marshal.load(data)
node = BTreeNode.new(leaf: data[:leaf])
node.keys = data[:keys]
node
end
end
Performance Considerations
Search operations in B-Trees complete in O(log_m n) time where m represents the branching factor and n represents the number of keys. Larger branching factors reduce the number of levels but increase the time spent searching within each node. The optimal branching factor balances these competing factors based on the relative costs of node access versus intra-node search.
For disk-based systems, node access dominates performance because disk I/O is orders of magnitude slower than memory operations. A typical hard drive takes 5-10 milliseconds for a random seek, while memory access takes nanoseconds. B-Trees optimize for this by maximizing the branching factor, often setting node size equal to disk block size (typically 4KB-16KB). With 4KB nodes and 8-byte keys, a node can store approximately 500 keys, reducing tree height dramatically.
B+ Trees outperform B-Trees for range queries. Since all values reside in linked leaf nodes, range queries require locating the first key and then scanning subsequent leaves. B-Trees scatter values throughout the tree, requiring more random access for range scans. Database systems exploit this property by using B+ Trees for indexes where range queries are common.
Cache behavior affects in-memory B-Tree performance significantly. Modern CPUs have multiple cache levels (L1, L2, L3) with decreasing speed and increasing size. Smaller nodes fit entirely in cache, reducing cache misses. The optimal branching factor for in-memory B-Trees is typically 32-64, smaller than disk-based variants.
Insertion performance depends on tree fullness and split frequency. When nodes have spare capacity, insertions complete in O(log_m n) time. When nodes are full, splits propagate up the tree, potentially requiring multiple node modifications. Amortized insertion cost remains O(log_m n), but worst-case scenarios involve updating nodes at every level from leaf to root.
class BTreeBenchmark
def self.compare_orders
orders = [4, 8, 16, 32]
key_count = 10_000
orders.each do |order|
tree = BTree.new(order)
insert_time = Benchmark.measure do
key_count.times { |i| tree.insert(rand(100_000)) }
end
search_time = Benchmark.measure do
1000.times { tree.search(rand(100_000)) }
end
puts "Order #{order}: Insert #{insert_time.real}s, Search #{search_time.real}s"
end
end
end
Deletion performance exhibits similar characteristics to insertion. Simple deletions from leaf nodes complete quickly. Deletions requiring merges or redistributions involve multiple node modifications. The amortized cost remains O(log_m n), but individual operations may require more work.
Memory usage scales with the branching factor and number of keys. Each node stores up to m-1 keys and m pointers. With n keys in the tree, approximately n/((m-1)/2) nodes exist in the worst case (minimum occupancy). Higher branching factors reduce the number of nodes but increase per-node overhead for partially filled nodes.
B+ Trees use memory more efficiently for indexing. Internal nodes contain only keys and pointers, allowing higher fanout for the same node size. Leaf nodes store all actual data, which may include large values. This separation allows internal nodes to remain in cache while leaf nodes may reside on disk.
Bulk loading performs better than sequential insertion. When building a B-Tree from sorted data, construct the tree bottom-up rather than inserting keys one at a time. This approach guarantees nodes are approximately 100% full rather than the typical 50% occupancy from random insertions:
def bulk_load_btree(sorted_keys, order)
leaf_nodes = []
sorted_keys.each_slice(order - 1) do |keys|
node = BTreeNode.new(leaf: true)
node.keys = keys
leaf_nodes << node
end
build_internal_levels(leaf_nodes, order)
end
def build_internal_levels(children, order)
return children.first if children.size == 1
parent_nodes = []
children.each_slice(order) do |child_group|
parent = BTreeNode.new(leaf: false)
parent.children = child_group
parent.keys = child_group[0..-2].map { |c| c.keys.last }
parent_nodes << parent
end
build_internal_levels(parent_nodes, order)
end
Design Considerations
B-Trees work best when the dataset exceeds available memory and random access patterns dominate. File systems use B-Trees to index directory structures because directory lookups are random and directories contain many entries. The branching factor is tuned to match disk block size, minimizing the number of disk reads per lookup.
B+ Trees are preferred for database indexes because range queries and sequential scans are common. The linked leaf structure allows efficient iteration without traversing the tree structure. Most relational databases use B+ Trees for both primary and secondary indexes.
The choice between B-Trees and B+ Trees depends on access patterns. If the application primarily performs point lookups, B-Trees may perform slightly better because values in internal nodes can be retrieved without descending to leaves. If range queries are common, B+ Trees significantly outperform B-Trees.
Hash tables provide O(1) average-case lookup but do not support efficient range queries or ordered iteration. B-Trees sacrifice some lookup speed for ordering guarantees and range query support. Use hash tables when key-value lookups dominate and ordering does not matter. Use B-Trees when ordering, range queries, or predecessor/successor operations are required.
Skip lists provide similar functionality to B-Trees with simpler implementation. Probabilistic balancing in skip lists avoids the complexity of node splitting and merging. However, B-Trees offer better cache locality and more predictable performance, making them preferable for systems where consistency matters.
Red-black trees and AVL trees maintain better balance than B-Trees with binary nodes. For in-memory data structures with relatively small datasets, these binary trees may outperform B-Trees due to simpler node structure and lower overhead. B-Trees become advantageous when node access cost is high or when the dataset exceeds cache size.
Order selection requires balancing multiple factors. Larger orders reduce tree height but increase intra-node search time and memory waste from partially filled nodes. For disk-based systems, match node size to disk block size. For in-memory systems, match node size to cache line size:
def optimal_order_for_cache(key_size, cache_line_size = 64)
pointer_size = 8
keys_per_line = cache_line_size / (key_size + pointer_size)
[keys_per_line + 1, 4].max # Minimum order of 4
end
Write-heavy workloads may benefit from alternative structures. B-Trees perform well for read-mostly workloads but suffer under heavy write loads due to split and merge operations. Log-structured merge trees (LSM trees) optimize for writes by buffering changes in memory and periodically merging them to disk. Choose B-Trees when reads outnumber writes or when real-time query performance is critical.
Practical Examples
A simple in-memory B-Tree for caching sorted data demonstrates basic operations:
class SortedCache
def initialize(order = 32)
@tree = BTree.new(order)
@size = 0
end
def add(key)
return false if @tree.search(key)
@tree.insert(key)
@size += 1
true
end
def contains?(key)
!@tree.search(key).nil?
end
def range(start_key, end_key)
results = []
collect_range(@tree.root, start_key, end_key, results)
results.sort
end
private
def collect_range(node, start_key, end_key, results)
return if node.nil?
i = 0
while i < node.keys.size
if node.keys[i] >= start_key && node.keys[i] <= end_key
results << node.keys[i]
end
unless node.leaf
collect_range(node.children[i], start_key, end_key, results) if node.keys[i] >= start_key
end
i += 1
end
collect_range(node.children[i], start_key, end_key, results) unless node.leaf
end
end
# Usage
cache = SortedCache.new
1000.times { cache.add(rand(10_000)) }
puts cache.range(100, 200)
A persistent B+ Tree for indexing records in a file demonstrates disk-based operations:
class FileIndex
Record = Struct.new(:id, :offset, :size)
def initialize(index_file, order = 128)
@tree = BPlusTree.new(order)
@index_file = index_file
load_index
end
def add_record(id, offset, size)
@tree.insert(id)
save_record_metadata(id, offset, size)
end
def find_record(id)
node = @tree.search(id)
return nil unless node
load_record_metadata(id)
end
def find_records_in_range(start_id, end_id)
ids = @tree.range_query(start_id, end_id)
ids.map { |id| load_record_metadata(id) }
end
private
def save_record_metadata(id, offset, size)
File.open(@index_file, 'ab') do |f|
f.write([id, offset, size].pack('Q*'))
end
end
def load_record_metadata(id)
File.open(@index_file, 'rb') do |f|
while (data = f.read(24))
record_id, offset, size = data.unpack('Q*')
return Record.new(record_id, offset, size) if record_id == id
end
end
nil
end
def load_index
return unless File.exist?(@index_file)
File.open(@index_file, 'rb') do |f|
while (data = f.read(24))
record_id, _, _ = data.unpack('Q*')
@tree.insert(record_id)
end
end
end
end
A concurrent B-Tree using locking demonstrates thread-safe operations:
class ConcurrentBTree
def initialize(order)
@tree = BTree.new(order)
@lock = Mutex.new
@node_locks = Hash.new { |h, k| h[k] = Mutex.new }
end
def insert(key)
@lock.synchronize do
path = find_insertion_path(key)
lock_path(path)
begin
@tree.insert(key)
ensure
unlock_path(path)
end
end
end
def search(key)
node = @tree.root
path = []
until node.nil?
path << node
@node_locks[node.object_id].lock
i = node.keys.bsearch_index { |k| k >= key }
if i && node.keys[i] == key
unlock_path(path)
return node
end
break if node.leaf
node = node.children[i || node.children.size - 1]
end
unlock_path(path)
nil
end
private
def find_insertion_path(key)
path = []
node = @tree.root
until node.leaf
path << node
i = node.keys.bsearch_index { |k| k > key } || node.keys.size
node = node.children[i]
end
path << node
path
end
def lock_path(path)
path.each { |node| @node_locks[node.object_id].lock }
end
def unlock_path(path)
path.reverse_each { |node| @node_locks[node.object_id].unlock }
end
end
Common Pitfalls
Node size mismatches with block size waste I/O bandwidth. When a node is smaller than the disk block size, the system reads unused data from disk. When a node is larger than the block size, a single node access requires multiple disk reads. Calculate node size based on key size, pointer size, and target block size:
def calculate_node_capacity(key_size, pointer_size, block_size)
header_size = 16 # Metadata overhead
available = block_size - header_size
entry_size = key_size + pointer_size
available / entry_size
end
# For 4KB blocks with 8-byte keys and pointers
capacity = calculate_node_capacity(8, 8, 4096)
# => 255 entries per node
Forgetting to link leaf nodes in B+ Tree implementations breaks range queries. Without proper linkage, range queries must traverse the tree structure to find each consecutive key, eliminating the primary advantage of B+ Trees:
# Wrong: Not linking leaves
def split_leaf_wrong(node)
new_leaf = BPlusTreeNode.new(leaf: true)
mid = node.keys.size / 2
new_leaf.keys = node.keys[mid..-1]
node.keys = node.keys[0...mid]
# Missing: new_leaf.next_leaf = node.next_leaf
# Missing: node.next_leaf = new_leaf
end
# Correct: Maintaining leaf chain
def split_leaf_correct(node)
new_leaf = BPlusTreeNode.new(leaf: true)
mid = node.keys.size / 2
new_leaf.keys = node.keys[mid..-1]
node.keys = node.keys[0...mid]
new_leaf.next_leaf = node.next_leaf
node.next_leaf = new_leaf
end
Off-by-one errors in split and merge operations corrupt tree structure. The median key handling differs between B-Trees and B+ Trees. In B-Trees, the median moves to the parent. In B+ Trees, the median copies to the parent while remaining in the right child. Mixing these approaches causes duplicate or missing keys.
Insufficient minimum occupancy checks cause cascading underflow. When a deletion makes a node fall below minimum occupancy, the code must either merge with a sibling or redistribute keys. Failing to check recursively after merges can leave parent nodes below minimum occupancy:
def delete(node, key)
# Delete key from node
remove_key(node, key)
# Check occupancy
if node.minimal?(@order) && node != @root
fix_underflow(node)
end
# Must check parent occupancy after fix
if node.parent && node.parent.minimal?(@order)
fix_underflow(node.parent)
end
end
Using equality comparison instead of binary search within nodes reduces performance. With large branching factors, linear search within a node becomes a bottleneck. Use binary search for intra-node key lookup:
# Slow: Linear search
def find_key_linear(keys, target)
keys.each_with_index do |key, i|
return i if key >= target
end
keys.size
end
# Fast: Binary search
def find_key_binary(keys, target)
keys.bsearch_index { |key| key >= target } || keys.size
end
Assuming uniform distribution leads to poor order selection. Real-world data often exhibits skew. Some ranges may have dense key concentrations while others remain sparse. Static node sizes handle skewed distributions poorly. Consider adaptive structures or partitioning strategies for highly skewed datasets.
Ignoring concurrency causes race conditions in multi-threaded environments. B-Tree modifications are not atomic. Reading while another thread splits a node can return inconsistent results. Implement proper locking strategies, such as lock coupling (crabbing) to allow concurrent reads while protecting against concurrent modifications.
Reference
B-Tree vs B+ Tree Comparison
| Characteristic | B-Tree | B+ Tree |
|---|---|---|
| Data Location | Internal and leaf nodes | Leaf nodes only |
| Internal Nodes | Store keys and values | Store keys only |
| Leaf Links | None | Linked list |
| Range Query | Requires tree traversal | Sequential leaf scan |
| Node Utilization | 50% minimum for data | Higher fanout in internal nodes |
| Cache Efficiency | Lower for large values | Better for index scanning |
| Space Overhead | Lower for small datasets | Higher due to key duplication |
| Common Use Case | File systems | Database indexes |
Time Complexity
| Operation | Average Case | Worst Case | Notes |
|---|---|---|---|
| Search | O(log n) | O(log n) | Base depends on branching factor |
| Insert | O(log n) | O(log n) | May cause splits |
| Delete | O(log n) | O(log n) | May cause merges |
| Range Query (B-Tree) | O(log n + k) | O(log n + k) | k is result size, scattered nodes |
| Range Query (B+ Tree) | O(log n + k) | O(log n + k) | k is result size, sequential access |
| Minimum/Maximum | O(log n) | O(log n) | Follow leftmost/rightmost path |
| Predecessor/Successor | O(log n) | O(log n) | B+ Tree: O(1) if in same leaf |
Space Complexity
| Component | Space | Notes |
|---|---|---|
| Node Overhead | O(m) per node | m is branching factor |
| Total Space | O(n) | n is number of keys |
| Minimum Occupancy | 50% | Except root node |
| Pointers per Node | m | One more than keys |
| Leaf Chain (B+ Tree) | O(l) | l is number of leaves |
Order Selection Guidelines
| System Type | Typical Order | Reason |
|---|---|---|
| Disk-based | 100-500 | Match disk block size (4KB-16KB) |
| In-memory | 32-64 | Match CPU cache line size |
| SSD-based | 50-200 | Balance between block size and access time |
| Embedded | 8-16 | Minimize memory footprint |
Node Operations
| Operation | Steps | Complexity |
|---|---|---|
| Split Node | Create new node, distribute keys, update parent | O(m) |
| Merge Nodes | Combine keys, remove from parent, update links | O(m) |
| Redistribute | Move keys through parent | O(m) |
| Search Node | Binary search keys | O(log m) |
Implementation Checklist
| Requirement | B-Tree | B+ Tree | Description |
|---|---|---|---|
| Key ordering | Required | Required | Maintains sorted order |
| Minimum children | ceil(m/2) | ceil(m/2) | Except root |
| Maximum children | m | m | Order of tree |
| Leaf linking | Not needed | Required | For range queries |
| Value storage | All nodes | Leaves only | Internal nodes navigate |
| Root handling | Special case | Special case | May have fewer children |
| Duplicate keys | Disallowed | Disallowed | Use multimap for duplicates |
Common Configuration Parameters
| Parameter | Typical Value | Impact |
|---|---|---|
| Order | 4-500 | Higher order reduces height |
| Node Size | 4096 bytes | Match storage block size |
| Cache Size | 10-20% of index | Reduces I/O operations |
| Fill Factor | 70-90% | Space vs rebalancing frequency |
| Key Size | 8-256 bytes | Affects branching factor |
| Pointer Size | 8 bytes | 64-bit addressing |