Overview
A binary search tree (BST) is a node-based data structure where each node contains a value and references to at most two child nodes, designated as left and right. The defining characteristic of a BST is its ordering property: for any given node, all values in its left subtree are less than the node's value, and all values in its right subtree are greater than the node's value.
This ordering property enables efficient search operations that eliminate half of the remaining nodes at each step, similar to binary search on a sorted array. The structure emerged from the need to combine the dynamic insertion capabilities of linked lists with the efficient search characteristics of sorted arrays.
Binary search trees serve as the foundation for many advanced data structures including AVL trees, red-black trees, and B-trees. The structure appears throughout software systems: database indexes, file systems, expression parsing, and symbol tables in compilers all employ BST variants.
The basic structure consists of nodes connected through parent-child relationships. Each node stores a key value and maintains pointers to left and right children. A node with no children is a leaf node. The topmost node is the root, and the height of a tree is the longest path from root to any leaf.
# Basic node structure
class Node
attr_accessor :value, :left, :right
def initialize(value)
@value = value
@left = nil
@right = nil
end
end
# Simple tree example
root = Node.new(10)
root.left = Node.new(5)
root.right = Node.new(15)
root.left.left = Node.new(3)
root.left.right = Node.new(7)
# Tree structure:
# 10
# / \
# 5 15
# / \
# 3 7
Key Principles
The binary search tree operates on a fundamental ordering invariant: for every node N with value V, all nodes in the left subtree contain values less than V, and all nodes in the right subtree contain values greater than V. This invariant must hold for every node in the tree, not just the root.
This ordering property determines the structure of all BST operations. When searching for a value, the algorithm compares the target with the current node's value and recursively searches either the left or right subtree. The search path follows a single path from root to a leaf, never backtracking.
Insertion maintains the ordering property by finding the appropriate leaf position and adding the new node there. The algorithm starts at the root and compares values at each node, moving left for smaller values and right for larger values until finding an empty position.
Deletion requires three distinct cases. Removing a leaf node is straightforward—simply remove it. Removing a node with one child replaces the node with its child. Removing a node with two children requires finding either the inorder predecessor (maximum value in left subtree) or inorder successor (minimum value in right subtree), replacing the node's value with this successor, and then deleting the successor node.
Tree traversal patterns extract values in specific orders. Inorder traversal (left, node, right) produces sorted output. Preorder traversal (node, left, right) processes parents before children. Postorder traversal (left, right, node) processes children before parents. Level-order traversal processes nodes level by level.
The height of a tree significantly impacts performance. A balanced tree maintains height proportional to log(n) where n is the number of nodes. A degenerate tree—essentially a linked list—has height proportional to n. The balance factor of a node is the height difference between its left and right subtrees.
BST operations depend on tree shape. A perfectly balanced tree provides O(log n) search, insertion, and deletion. An unbalanced tree degrades to O(n) performance. The distribution of insertion order determines balance—random insertions generally produce reasonably balanced trees, while sorted insertions create degenerate trees.
# Inorder traversal demonstrates sorted property
def inorder(node, result = [])
return result if node.nil?
inorder(node.left, result)
result << node.value
inorder(node.right, result)
result
end
root = Node.new(10)
root.left = Node.new(5)
root.right = Node.new(15)
root.left.left = Node.new(3)
root.left.right = Node.new(7)
inorder(root)
# => [3, 5, 7, 10, 15] - sorted output
Ruby Implementation
Ruby implements binary search trees through classes with instance variables for node values and child references. The implementation typically separates node structure from tree operations, though combined implementations also exist.
class BinarySearchTree
attr_reader :root
def initialize
@root = nil
end
def insert(value)
@root = insert_node(@root, value)
end
private
def insert_node(node, value)
return Node.new(value) if node.nil?
if value < node.value
node.left = insert_node(node.left, value)
elsif value > node.value
node.right = insert_node(node.right, value)
end
node
end
end
tree = BinarySearchTree.new
[10, 5, 15, 3, 7, 12, 17].each { |val| tree.insert(val) }
Search operations traverse the tree following the ordering property. The recursive approach naturally expresses the algorithm's structure.
class BinarySearchTree
def search(value)
search_node(@root, value)
end
private
def search_node(node, value)
return nil if node.nil?
return node if node.value == value
if value < node.value
search_node(node.left, value)
else
search_node(node.right, value)
end
end
end
tree.search(7) # => Node with value 7
tree.search(20) # => nil
Iterative implementations avoid recursion overhead and stack depth limitations. The iterative search uses a loop with explicit pointer manipulation.
class BinarySearchTree
def search_iterative(value)
current = @root
until current.nil?
return current if current.value == value
current = value < current.value ? current.left : current.right
end
nil
end
end
Deletion handles the three cases through conditional logic. Finding the inorder successor requires locating the minimum value in the right subtree.
class BinarySearchTree
def delete(value)
@root = delete_node(@root, value)
end
private
def delete_node(node, value)
return nil if node.nil?
if value < node.value
node.left = delete_node(node.left, value)
elsif value > node.value
node.right = delete_node(node.right, value)
else
# Node with only one child or no child
return node.right if node.left.nil?
return node.left if node.right.nil?
# Node with two children: get inorder successor
min_node = find_min(node.right)
node.value = min_node.value
node.right = delete_node(node.right, min_node.value)
end
node
end
def find_min(node)
current = node
current = current.left until current.left.nil?
current
end
end
Ruby's block syntax enables expressive traversal implementations. The tree can yield values to blocks for flexible processing.
class BinarySearchTree
def each_inorder(&block)
traverse_inorder(@root, &block)
end
def each_preorder(&block)
traverse_preorder(@root, &block)
end
def each_postorder(&block)
traverse_postorder(@root, &block)
end
private
def traverse_inorder(node, &block)
return if node.nil?
traverse_inorder(node.left, &block)
yield node.value
traverse_inorder(node.right, &block)
end
def traverse_preorder(node, &block)
return if node.nil?
yield node.value
traverse_preorder(node.left, &block)
traverse_preorder(node.right, &block)
end
def traverse_postorder(node, &block)
return if node.nil?
traverse_postorder(node.left, &block)
traverse_postorder(node.right, &block)
yield node.value
end
end
tree.each_inorder { |val| puts val } # Sorted output
Including Enumerable mixin extends the tree with Ruby's collection methods. The tree must implement the each method to support Enumerable.
class BinarySearchTree
include Enumerable
def each(&block)
each_inorder(&block)
end
end
tree.map { |val| val * 2 }
tree.select { |val| val.even? }
tree.reduce(:+)
Performance Considerations
Binary search tree performance depends critically on tree balance. A balanced tree maintains height h ≈ log₂(n) for n nodes, providing O(log n) operations. An unbalanced tree degrades to height h ≈ n, resulting in O(n) operations equivalent to a linked list.
Search operations perform one comparison per tree level. In a balanced tree with 1,000,000 nodes, search requires approximately 20 comparisons. In a degenerate tree, search might require 1,000,000 comparisons. The best case occurs when the target is the root (O(1)), while worst case depends on tree height.
Insertion in a balanced tree examines O(log n) nodes to find the insertion point. After locating the position, insertion completes in O(1) time by adjusting pointers. The total insertion cost is O(log n) for balanced trees and O(n) for degenerate trees. Bulk insertions from sorted data create degenerate trees—randomizing insertion order improves balance.
Deletion requires finding the node (O(log n) balanced, O(n) degenerate), then performing the deletion operation. Single-child deletions execute in O(1) after finding the node. Two-child deletions require finding the inorder successor, adding another O(log n) operation. Total deletion cost matches search cost.
Traversal algorithms visit every node exactly once, performing O(n) work regardless of tree shape. Different traversal orders have identical time complexity but different memory characteristics. Recursive traversals consume stack space proportional to tree height: O(log n) for balanced trees, O(n) for degenerate trees. Iterative traversals using explicit stacks have similar space requirements.
Memory consumption includes node overhead beyond the stored values. Each node requires memory for the value plus two pointers. On 64-bit systems, this adds 16 bytes of pointer overhead per node. A tree with 1,000,000 integers (4 bytes each) consumes approximately 20MB total: 4MB for values and 16MB for pointers.
Cache locality affects real-world performance beyond theoretical complexity. Tree traversal jumps between memory locations, causing cache misses. Arrays provide better cache performance for small datasets despite worse algorithmic complexity. BST advantages emerge with larger datasets where O(log n) beats O(n) despite cache effects.
require 'benchmark'
# Performance comparison: balanced vs degenerate trees
def build_balanced(values)
tree = BinarySearchTree.new
values.shuffle.each { |val| tree.insert(val) }
tree
end
def build_degenerate(values)
tree = BinarySearchTree.new
values.sort.each { |val| tree.insert(val) }
tree
end
values = (1..10000).to_a
balanced = build_balanced(values)
degenerate = build_degenerate(values)
Benchmark.bm do |x|
x.report("balanced search:") { 1000.times { balanced.search(rand(10000)) } }
x.report("degenerate search:") { 1000.times { degenerate.search(rand(10000)) } }
end
# Balanced tree: ~0.01s
# Degenerate tree: ~5.0s (500x slower)
Optimization strategies include maintaining balance through self-balancing variants (AVL trees, red-black trees), caching frequently accessed nodes, using iterative implementations to avoid recursion overhead, and implementing path compression for repeated operations.
Practical Examples
A spell checker dictionary demonstrates BST search capabilities. The tree stores dictionary words for fast lookup operations.
class SpellChecker
def initialize
@dictionary = BinarySearchTree.new
end
def load_dictionary(words)
words.each { |word| @dictionary.insert(word.downcase) }
end
def check_word(word)
!@dictionary.search(word.downcase).nil?
end
def check_text(text)
words = text.downcase.split(/\W+/)
misspelled = words.reject { |word| check_word(word) }
misspelled.empty? ? "All words correct" : "Misspelled: #{misspelled.join(', ')}"
end
end
checker = SpellChecker.new
checker.load_dictionary(%w[the quick brown fox jumps over lazy dog])
checker.check_text("The quik brown fox")
# => "Misspelled: quik"
A file system index tracks file locations using paths as keys. The BST enables fast file lookup and alphabetical directory listings.
class FileSystemIndex
def initialize
@index = BinarySearchTree.new
end
def add_file(path, metadata)
@index.insert({ path: path, metadata: metadata, timestamp: Time.now })
end
def find_file(path)
node = @index.search(path)
node ? node.value[:metadata] : nil
end
def list_files_alphabetically
files = []
@index.each_inorder { |data| files << data[:path] }
files
end
end
index = FileSystemIndex.new
index.add_file("/home/user/documents/report.pdf", size: 1024)
index.add_file("/home/user/documents/data.csv", size: 2048)
index.add_file("/home/user/images/photo.jpg", size: 4096)
index.list_files_alphabetically
# => ["/home/user/documents/data.csv", "/home/user/documents/report.pdf", "/home/user/images/photo.jpg"]
A priority scheduling system manages tasks by priority using a BST where node values represent priority levels.
class TaskScheduler
Task = Struct.new(:name, :priority, :duration)
def initialize
@tasks = BinarySearchTree.new
end
def add_task(name, priority, duration)
task = Task.new(name, priority, duration)
@tasks.insert(task)
end
def next_task
# Find highest priority (maximum value)
current = @tasks.root
current = current.right until current.right.nil?
current.value
end
def process_tasks
results = []
@tasks.each_inorder { |task| results << process_task(task) }
results.reverse # Process highest priority first
end
private
def process_task(task)
"Processing #{task.name} (Priority: #{task.priority}, Duration: #{task.duration}s)"
end
end
scheduler = TaskScheduler.new
scheduler.add_task("Database backup", 3, 300)
scheduler.add_task("Send email", 1, 5)
scheduler.add_task("Generate report", 2, 60)
next_up = scheduler.next_task
# => Task(name: "Database backup", priority: 3, duration: 300)
An autocomplete system suggests completions using a BST modified to store string prefixes. The implementation finds all words sharing a prefix.
class Autocomplete
def initialize
@words = BinarySearchTree.new
end
def add_word(word)
@words.insert(word.downcase)
end
def suggest(prefix)
suggestions = []
collect_matching(@words.root, prefix.downcase, suggestions)
suggestions.sort
end
private
def collect_matching(node, prefix, suggestions)
return if node.nil?
collect_matching(node.left, prefix, suggestions)
if node.value.start_with?(prefix)
suggestions << node.value
end
collect_matching(node.right, prefix, suggestions)
end
end
complete = Autocomplete.new
%w[apple application apply appreciate approach app].each { |word| complete.add_word(word) }
complete.suggest("app")
# => ["app", "apple", "application", "apply", "appreciate", "approach"]
Common Patterns
Range query patterns extract all values within a specified range. The algorithm prunes subtrees that cannot contain range values.
class BinarySearchTree
def range_query(min_val, max_val)
results = []
range_search(@root, min_val, max_val, results)
results
end
private
def range_search(node, min_val, max_val, results)
return if node.nil?
# Search left if node value might be greater than range minimum
range_search(node.left, min_val, max_val, results) if node.value > min_val
# Include current node if in range
results << node.value if node.value >= min_val && node.value <= max_val
# Search right if node value might be less than range maximum
range_search(node.right, min_val, max_val, results) if node.value < max_val
end
end
tree = BinarySearchTree.new
[10, 5, 15, 3, 7, 12, 17, 1, 4, 6, 8].each { |val| tree.insert(val) }
tree.range_query(5, 12)
# => [5, 6, 7, 8, 10, 12]
Level-order traversal processes nodes by depth level. This pattern uses a queue to maintain processing order.
class BinarySearchTree
def level_order
return [] if @root.nil?
results = []
queue = [@root]
until queue.empty?
node = queue.shift
results << node.value
queue << node.left if node.left
queue << node.right if node.right
end
results
end
end
tree.level_order
# => [10, 5, 15, 3, 7, 12, 17, 1, 4, 6, 8]
Finding the kth smallest element combines inorder traversal with counting. The algorithm stops after processing k elements.
class BinarySearchTree
def kth_smallest(k)
counter = [0]
result = [nil]
find_kth(@root, k, counter, result)
result[0]
end
private
def find_kth(node, k, counter, result)
return if node.nil? || counter[0] >= k
find_kth(node.left, k, counter, result)
counter[0] += 1
result[0] = node.value if counter[0] == k
find_kth(node.right, k, counter, result)
end
end
tree.kth_smallest(3) # => 4 (third smallest value)
Validating BST structure verifies the ordering property. The algorithm tracks valid ranges for each subtree.
class BinarySearchTree
def valid_bst?
validate_node(@root, nil, nil)
end
private
def validate_node(node, min_val, max_val)
return true if node.nil?
return false if min_val && node.value <= min_val
return false if max_val && node.value >= max_val
validate_node(node.left, min_val, node.value) &&
validate_node(node.right, node.value, max_val)
end
end
Computing tree height determines the maximum depth from root to leaf. The recursive approach compares subtree heights.
class BinarySearchTree
def height
calculate_height(@root)
end
private
def calculate_height(node)
return -1 if node.nil?
left_height = calculate_height(node.left)
right_height = calculate_height(node.right)
[left_height, right_height].max + 1
end
end
Design Considerations
Binary search trees trade memory overhead for search efficiency compared to arrays. Arrays consume less memory but require O(n) search time for unsorted data and O(n) insertion time for sorted arrays. BSTs provide O(log n) operations with reasonable memory overhead when balanced.
Hash tables offer O(1) average-case operations but cannot maintain sorted order or perform efficient range queries. BSTs excel at ordered operations: finding minimum/maximum values, retrieving sorted sequences, and querying ranges. Applications requiring sorted output or range operations benefit from BSTs over hash tables.
Self-balancing tree variants (AVL trees, red-black trees) guarantee O(log n) performance regardless of insertion order. Standard BSTs risk degeneration but require simpler implementation and less memory per node. Choose self-balancing variants for adversarial input patterns or when performance guarantees are critical. Standard BSTs suffice when input approximates random distribution.
B-trees and B+ trees extend BST concepts for disk-based storage. These structures minimize disk reads by storing multiple keys per node. Database indexes typically use B+ trees rather than binary trees because disk I/O costs dominate performance.
For small datasets (under 100 elements), array linear search often outperforms BST operations due to cache locality. BST advantages emerge with larger datasets where algorithmic complexity differences overcome constant-factor costs.
Comparison operations determine BST applicability. Values must support total ordering—each pair of values has a defined less-than relationship. Objects requiring custom comparison must implement comparison methods.
class Person
include Comparable
attr_reader :name, :age
def initialize(name, age)
@name = name
@age = age
end
def <=>(other)
age <=> other.age
end
end
tree = BinarySearchTree.new
tree.insert(Person.new("Alice", 30))
tree.insert(Person.new("Bob", 25))
tree.insert(Person.new("Charlie", 35))
Duplicate value handling requires policy decisions. Some implementations reject duplicates, others allow multiple instances. Storing duplicates in left subtrees maintains consistent behavior but slightly complicates deletion.
Thread safety demands synchronization for concurrent operations. Read-only traversals can proceed concurrently, but modifications require exclusive access. Fine-grained locking on individual nodes enables better concurrency than tree-level locks but increases complexity.
Reference
Core Operations
| Operation | Average Case | Worst Case | Description |
|---|---|---|---|
| Search | O(log n) | O(n) | Find node with target value |
| Insert | O(log n) | O(n) | Add new node maintaining order |
| Delete | O(log n) | O(n) | Remove node preserving structure |
| Find Min | O(log n) | O(n) | Locate leftmost node |
| Find Max | O(log n) | O(n) | Locate rightmost node |
| Inorder | O(n) | O(n) | Traverse in sorted order |
| Preorder | O(n) | O(n) | Traverse root before children |
| Postorder | O(n) | O(n) | Traverse children before root |
Space Complexity
| Aspect | Complexity | Notes |
|---|---|---|
| Storage | O(n) | n nodes plus pointer overhead |
| Recursion Stack | O(h) | h is tree height |
| Balanced Height | O(log n) | Height proportional to log n |
| Degenerate Height | O(n) | Height equals node count |
Node Structure
| Component | Type | Purpose |
|---|---|---|
| value | Any comparable | Stored data element |
| left | Node reference | Pointer to left child |
| right | Node reference | Pointer to right child |
| parent | Node reference | Optional upward pointer |
Traversal Orders
| Traversal | Visit Order | Output Characteristic |
|---|---|---|
| Inorder | Left, Node, Right | Sorted ascending sequence |
| Reverse Inorder | Right, Node, Left | Sorted descending sequence |
| Preorder | Node, Left, Right | Root before descendants |
| Postorder | Left, Right, Node | Children before parent |
| Level-order | By depth level | Breadth-first processing |
Deletion Cases
| Case | Condition | Action |
|---|---|---|
| Leaf Node | No children | Remove directly |
| One Child | Single child present | Replace with child |
| Two Children | Both children exist | Replace with successor, delete successor |
Common Methods
| Method | Returns | Purpose |
|---|---|---|
| insert(value) | node | Add value maintaining order |
| search(value) | node or nil | Find node with value |
| delete(value) | node | Remove node with value |
| min | value | Smallest value in tree |
| max | value | Largest value in tree |
| height | integer | Maximum depth from root |
| size | integer | Total node count |
| empty? | boolean | Check if tree is empty |
Balance Metrics
| Metric | Formula | Interpretation |
|---|---|---|
| Height | max(left_height, right_height) + 1 | Longest path to leaf |
| Balance Factor | left_height - right_height | Imbalance measure |
| Perfectly Balanced | height = log₂(n) | Ideal case |
| Degenerate | height = n | Worst case |