CrackedRuby - Binary Search Trees

Overview

A binary search tree (BST) is a node-based data structure where each node contains a value and references to at most two child nodes, designated as left and right. The defining characteristic of a BST is its ordering property: for any given node, all values in its left subtree are less than the node's value, and all values in its right subtree are greater than the node's value.

This ordering property enables efficient search operations that eliminate half of the remaining nodes at each step, similar to binary search on a sorted array. The structure emerged from the need to combine the dynamic insertion capabilities of linked lists with the efficient search characteristics of sorted arrays.

Binary search trees serve as the foundation for many advanced data structures including AVL trees, red-black trees, and B-trees. The structure appears throughout software systems: database indexes, file systems, expression parsing, and symbol tables in compilers all employ BST variants.

The basic structure consists of nodes connected through parent-child relationships. Each node stores a key value and maintains pointers to left and right children. A node with no children is a leaf node. The topmost node is the root, and the height of a tree is the longest path from root to any leaf.

# Basic node structure
class Node
  attr_accessor :value, :left, :right
  
  def initialize(value)
    @value = value
    @left = nil
    @right = nil
  end
end

# Simple tree example
root = Node.new(10)
root.left = Node.new(5)
root.right = Node.new(15)
root.left.left = Node.new(3)
root.left.right = Node.new(7)
# Tree structure:
#       10
#      /  \
#     5    15
#    / \
#   3   7

Key Principles

The binary search tree operates on a fundamental ordering invariant: for every node N with value V, all nodes in the left subtree contain values less than V, and all nodes in the right subtree contain values greater than V. This invariant must hold for every node in the tree, not just the root.

This ordering property determines the structure of all BST operations. When searching for a value, the algorithm compares the target with the current node's value and recursively searches either the left or right subtree. The search path follows a single path from root to a leaf, never backtracking.

Insertion maintains the ordering property by finding the appropriate leaf position and adding the new node there. The algorithm starts at the root and compares values at each node, moving left for smaller values and right for larger values until finding an empty position.

Deletion requires three distinct cases. Removing a leaf node is straightforward—simply remove it. Removing a node with one child replaces the node with its child. Removing a node with two children requires finding either the inorder predecessor (maximum value in left subtree) or inorder successor (minimum value in right subtree), replacing the node's value with this successor, and then deleting the successor node.

Tree traversal patterns extract values in specific orders. Inorder traversal (left, node, right) produces sorted output. Preorder traversal (node, left, right) processes parents before children. Postorder traversal (left, right, node) processes children before parents. Level-order traversal processes nodes level by level.

The height of a tree significantly impacts performance. A balanced tree maintains height proportional to log(n) where n is the number of nodes. A degenerate tree—essentially a linked list—has height proportional to n. The balance factor of a node is the height difference between its left and right subtrees.

BST operations depend on tree shape. A perfectly balanced tree provides O(log n) search, insertion, and deletion. An unbalanced tree degrades to O(n) performance. The distribution of insertion order determines balance—random insertions generally produce reasonably balanced trees, while sorted insertions create degenerate trees.

# Inorder traversal demonstrates sorted property
def inorder(node, result = [])
  return result if node.nil?
  
  inorder(node.left, result)
  result << node.value
  inorder(node.right, result)
  result
end

root = Node.new(10)
root.left = Node.new(5)
root.right = Node.new(15)
root.left.left = Node.new(3)
root.left.right = Node.new(7)

inorder(root)
# => [3, 5, 7, 10, 15] - sorted output

Ruby Implementation

Ruby implements binary search trees through classes with instance variables for node values and child references. The implementation typically separates node structure from tree operations, though combined implementations also exist.

class BinarySearchTree
  attr_reader :root
  
  def initialize
    @root = nil
  end
  
  def insert(value)
    @root = insert_node(@root, value)
  end
  
  private
  
  def insert_node(node, value)
    return Node.new(value) if node.nil?
    
    if value < node.value
      node.left = insert_node(node.left, value)
    elsif value > node.value
      node.right = insert_node(node.right, value)
    end
    
    node
  end
end

tree = BinarySearchTree.new
[10, 5, 15, 3, 7, 12, 17].each { |val| tree.insert(val) }

Search operations traverse the tree following the ordering property. The recursive approach naturally expresses the algorithm's structure.

class BinarySearchTree
  def search(value)
    search_node(@root, value)
  end
  
  private
  
  def search_node(node, value)
    return nil if node.nil?
    return node if node.value == value
    
    if value < node.value
      search_node(node.left, value)
    else
      search_node(node.right, value)
    end
  end
end

tree.search(7)  # => Node with value 7
tree.search(20) # => nil

Iterative implementations avoid recursion overhead and stack depth limitations. The iterative search uses a loop with explicit pointer manipulation.

class BinarySearchTree
  def search_iterative(value)
    current = @root
    
    until current.nil?
      return current if current.value == value
      current = value < current.value ? current.left : current.right
    end
    
    nil
  end
end

Deletion handles the three cases through conditional logic. Finding the inorder successor requires locating the minimum value in the right subtree.

class BinarySearchTree
  def delete(value)
    @root = delete_node(@root, value)
  end
  
  private
  
  def delete_node(node, value)
    return nil if node.nil?
    
    if value < node.value
      node.left = delete_node(node.left, value)
    elsif value > node.value
      node.right = delete_node(node.right, value)
    else
      # Node with only one child or no child
      return node.right if node.left.nil?
      return node.left if node.right.nil?
      
      # Node with two children: get inorder successor
      min_node = find_min(node.right)
      node.value = min_node.value
      node.right = delete_node(node.right, min_node.value)
    end
    
    node
  end
  
  def find_min(node)
    current = node
    current = current.left until current.left.nil?
    current
  end
end

Ruby's block syntax enables expressive traversal implementations. The tree can yield values to blocks for flexible processing.

class BinarySearchTree
  def each_inorder(&block)
    traverse_inorder(@root, &block)
  end
  
  def each_preorder(&block)
    traverse_preorder(@root, &block)
  end
  
  def each_postorder(&block)
    traverse_postorder(@root, &block)
  end
  
  private
  
  def traverse_inorder(node, &block)
    return if node.nil?
    
    traverse_inorder(node.left, &block)
    yield node.value
    traverse_inorder(node.right, &block)
  end
  
  def traverse_preorder(node, &block)
    return if node.nil?
    
    yield node.value
    traverse_preorder(node.left, &block)
    traverse_preorder(node.right, &block)
  end
  
  def traverse_postorder(node, &block)
    return if node.nil?
    
    traverse_postorder(node.left, &block)
    traverse_postorder(node.right, &block)
    yield node.value
  end
end

tree.each_inorder { |val| puts val }  # Sorted output

Including Enumerable mixin extends the tree with Ruby's collection methods. The tree must implement the each method to support Enumerable.

class BinarySearchTree
  include Enumerable
  
  def each(&block)
    each_inorder(&block)
  end
end

tree.map { |val| val * 2 }
tree.select { |val| val.even? }
tree.reduce(:+)

Performance Considerations

Binary search tree performance depends critically on tree balance. A balanced tree maintains height h ≈ log₂(n) for n nodes, providing O(log n) operations. An unbalanced tree degrades to height h ≈ n, resulting in O(n) operations equivalent to a linked list.

Search operations perform one comparison per tree level. In a balanced tree with 1,000,000 nodes, search requires approximately 20 comparisons. In a degenerate tree, search might require 1,000,000 comparisons. The best case occurs when the target is the root (O(1)), while worst case depends on tree height.

Insertion in a balanced tree examines O(log n) nodes to find the insertion point. After locating the position, insertion completes in O(1) time by adjusting pointers. The total insertion cost is O(log n) for balanced trees and O(n) for degenerate trees. Bulk insertions from sorted data create degenerate trees—randomizing insertion order improves balance.

Deletion requires finding the node (O(log n) balanced, O(n) degenerate), then performing the deletion operation. Single-child deletions execute in O(1) after finding the node. Two-child deletions require finding the inorder successor, adding another O(log n) operation. Total deletion cost matches search cost.

Traversal algorithms visit every node exactly once, performing O(n) work regardless of tree shape. Different traversal orders have identical time complexity but different memory characteristics. Recursive traversals consume stack space proportional to tree height: O(log n) for balanced trees, O(n) for degenerate trees. Iterative traversals using explicit stacks have similar space requirements.

Memory consumption includes node overhead beyond the stored values. Each node requires memory for the value plus two pointers. On 64-bit systems, this adds 16 bytes of pointer overhead per node. A tree with 1,000,000 integers (4 bytes each) consumes approximately 20MB total: 4MB for values and 16MB for pointers.

Cache locality affects real-world performance beyond theoretical complexity. Tree traversal jumps between memory locations, causing cache misses. Arrays provide better cache performance for small datasets despite worse algorithmic complexity. BST advantages emerge with larger datasets where O(log n) beats O(n) despite cache effects.

require 'benchmark'

# Performance comparison: balanced vs degenerate trees
def build_balanced(values)
  tree = BinarySearchTree.new
  values.shuffle.each { |val| tree.insert(val) }
  tree
end

def build_degenerate(values)
  tree = BinarySearchTree.new
  values.sort.each { |val| tree.insert(val) }
  tree
end

values = (1..10000).to_a
balanced = build_balanced(values)
degenerate = build_degenerate(values)

Benchmark.bm do |x|
  x.report("balanced search:")   { 1000.times { balanced.search(rand(10000)) } }
  x.report("degenerate search:") { 1000.times { degenerate.search(rand(10000)) } }
end
# Balanced tree: ~0.01s
# Degenerate tree: ~5.0s (500x slower)

Optimization strategies include maintaining balance through self-balancing variants (AVL trees, red-black trees), caching frequently accessed nodes, using iterative implementations to avoid recursion overhead, and implementing path compression for repeated operations.

Practical Examples

A spell checker dictionary demonstrates BST search capabilities. The tree stores dictionary words for fast lookup operations.

class SpellChecker
  def initialize
    @dictionary = BinarySearchTree.new
  end
  
  def load_dictionary(words)
    words.each { |word| @dictionary.insert(word.downcase) }
  end
  
  def check_word(word)
    !@dictionary.search(word.downcase).nil?
  end
  
  def check_text(text)
    words = text.downcase.split(/\W+/)
    misspelled = words.reject { |word| check_word(word) }
    misspelled.empty? ? "All words correct" : "Misspelled: #{misspelled.join(', ')}"
  end
end

checker = SpellChecker.new
checker.load_dictionary(%w[the quick brown fox jumps over lazy dog])
checker.check_text("The quik brown fox") 
# => "Misspelled: quik"

A file system index tracks file locations using paths as keys. The BST enables fast file lookup and alphabetical directory listings.

class FileSystemIndex
  def initialize
    @index = BinarySearchTree.new
  end
  
  def add_file(path, metadata)
    @index.insert({ path: path, metadata: metadata, timestamp: Time.now })
  end
  
  def find_file(path)
    node = @index.search(path)
    node ? node.value[:metadata] : nil
  end
  
  def list_files_alphabetically
    files = []
    @index.each_inorder { |data| files << data[:path] }
    files
  end
end

index = FileSystemIndex.new
index.add_file("/home/user/documents/report.pdf", size: 1024)
index.add_file("/home/user/documents/data.csv", size: 2048)
index.add_file("/home/user/images/photo.jpg", size: 4096)
index.list_files_alphabetically
# => ["/home/user/documents/data.csv", "/home/user/documents/report.pdf", "/home/user/images/photo.jpg"]

A priority scheduling system manages tasks by priority using a BST where node values represent priority levels.

class TaskScheduler
  Task = Struct.new(:name, :priority, :duration)
  
  def initialize
    @tasks = BinarySearchTree.new
  end
  
  def add_task(name, priority, duration)
    task = Task.new(name, priority, duration)
    @tasks.insert(task)
  end
  
  def next_task
    # Find highest priority (maximum value)
    current = @tasks.root
    current = current.right until current.right.nil?
    current.value
  end
  
  def process_tasks
    results = []
    @tasks.each_inorder { |task| results << process_task(task) }
    results.reverse # Process highest priority first
  end
  
  private
  
  def process_task(task)
    "Processing #{task.name} (Priority: #{task.priority}, Duration: #{task.duration}s)"
  end
end

scheduler = TaskScheduler.new
scheduler.add_task("Database backup", 3, 300)
scheduler.add_task("Send email", 1, 5)
scheduler.add_task("Generate report", 2, 60)
next_up = scheduler.next_task
# => Task(name: "Database backup", priority: 3, duration: 300)

An autocomplete system suggests completions using a BST modified to store string prefixes. The implementation finds all words sharing a prefix.

class Autocomplete
  def initialize
    @words = BinarySearchTree.new
  end
  
  def add_word(word)
    @words.insert(word.downcase)
  end
  
  def suggest(prefix)
    suggestions = []
    collect_matching(@words.root, prefix.downcase, suggestions)
    suggestions.sort
  end
  
  private
  
  def collect_matching(node, prefix, suggestions)
    return if node.nil?
    
    collect_matching(node.left, prefix, suggestions)
    
    if node.value.start_with?(prefix)
      suggestions << node.value
    end
    
    collect_matching(node.right, prefix, suggestions)
  end
end

complete = Autocomplete.new
%w[apple application apply appreciate approach app].each { |word| complete.add_word(word) }
complete.suggest("app")
# => ["app", "apple", "application", "apply", "appreciate", "approach"]

Common Patterns

Range query patterns extract all values within a specified range. The algorithm prunes subtrees that cannot contain range values.

class BinarySearchTree
  def range_query(min_val, max_val)
    results = []
    range_search(@root, min_val, max_val, results)
    results
  end
  
  private
  
  def range_search(node, min_val, max_val, results)
    return if node.nil?
    
    # Search left if node value might be greater than range minimum
    range_search(node.left, min_val, max_val, results) if node.value > min_val
    
    # Include current node if in range
    results << node.value if node.value >= min_val && node.value <= max_val
    
    # Search right if node value might be less than range maximum
    range_search(node.right, min_val, max_val, results) if node.value < max_val
  end
end

tree = BinarySearchTree.new
[10, 5, 15, 3, 7, 12, 17, 1, 4, 6, 8].each { |val| tree.insert(val) }
tree.range_query(5, 12)
# => [5, 6, 7, 8, 10, 12]

Level-order traversal processes nodes by depth level. This pattern uses a queue to maintain processing order.

class BinarySearchTree
  def level_order
    return [] if @root.nil?
    
    results = []
    queue = [@root]
    
    until queue.empty?
      node = queue.shift
      results << node.value
      
      queue << node.left if node.left
      queue << node.right if node.right
    end
    
    results
  end
end

tree.level_order
# => [10, 5, 15, 3, 7, 12, 17, 1, 4, 6, 8]

Finding the kth smallest element combines inorder traversal with counting. The algorithm stops after processing k elements.

class BinarySearchTree
  def kth_smallest(k)
    counter = [0]
    result = [nil]
    find_kth(@root, k, counter, result)
    result[0]
  end
  
  private
  
  def find_kth(node, k, counter, result)
    return if node.nil? || counter[0] >= k
    
    find_kth(node.left, k, counter, result)
    
    counter[0] += 1
    result[0] = node.value if counter[0] == k
    
    find_kth(node.right, k, counter, result)
  end
end

tree.kth_smallest(3)  # => 4 (third smallest value)

Validating BST structure verifies the ordering property. The algorithm tracks valid ranges for each subtree.

class BinarySearchTree
  def valid_bst?
    validate_node(@root, nil, nil)
  end
  
  private
  
  def validate_node(node, min_val, max_val)
    return true if node.nil?
    
    return false if min_val && node.value <= min_val
    return false if max_val && node.value >= max_val
    
    validate_node(node.left, min_val, node.value) &&
      validate_node(node.right, node.value, max_val)
  end
end

Computing tree height determines the maximum depth from root to leaf. The recursive approach compares subtree heights.

class BinarySearchTree
  def height
    calculate_height(@root)
  end
  
  private
  
  def calculate_height(node)
    return -1 if node.nil?
    
    left_height = calculate_height(node.left)
    right_height = calculate_height(node.right)
    
    [left_height, right_height].max + 1
  end
end

Design Considerations

Binary search trees trade memory overhead for search efficiency compared to arrays. Arrays consume less memory but require O(n) search time for unsorted data and O(n) insertion time for sorted arrays. BSTs provide O(log n) operations with reasonable memory overhead when balanced.

Hash tables offer O(1) average-case operations but cannot maintain sorted order or perform efficient range queries. BSTs excel at ordered operations: finding minimum/maximum values, retrieving sorted sequences, and querying ranges. Applications requiring sorted output or range operations benefit from BSTs over hash tables.

Self-balancing tree variants (AVL trees, red-black trees) guarantee O(log n) performance regardless of insertion order. Standard BSTs risk degeneration but require simpler implementation and less memory per node. Choose self-balancing variants for adversarial input patterns or when performance guarantees are critical. Standard BSTs suffice when input approximates random distribution.

B-trees and B+ trees extend BST concepts for disk-based storage. These structures minimize disk reads by storing multiple keys per node. Database indexes typically use B+ trees rather than binary trees because disk I/O costs dominate performance.

For small datasets (under 100 elements), array linear search often outperforms BST operations due to cache locality. BST advantages emerge with larger datasets where algorithmic complexity differences overcome constant-factor costs.

Comparison operations determine BST applicability. Values must support total ordering—each pair of values has a defined less-than relationship. Objects requiring custom comparison must implement comparison methods.

class Person
  include Comparable
  attr_reader :name, :age
  
  def initialize(name, age)
    @name = name
    @age = age
  end
  
  def <=>(other)
    age <=> other.age
  end
end

tree = BinarySearchTree.new
tree.insert(Person.new("Alice", 30))
tree.insert(Person.new("Bob", 25))
tree.insert(Person.new("Charlie", 35))

Duplicate value handling requires policy decisions. Some implementations reject duplicates, others allow multiple instances. Storing duplicates in left subtrees maintains consistent behavior but slightly complicates deletion.

Thread safety demands synchronization for concurrent operations. Read-only traversals can proceed concurrently, but modifications require exclusive access. Fine-grained locking on individual nodes enables better concurrency than tree-level locks but increases complexity.

Reference

Core Operations

Operation	Average Case	Worst Case	Description
Search	O(log n)	O(n)	Find node with target value
Insert	O(log n)	O(n)	Add new node maintaining order
Delete	O(log n)	O(n)	Remove node preserving structure
Find Min	O(log n)	O(n)	Locate leftmost node
Find Max	O(log n)	O(n)	Locate rightmost node
Inorder	O(n)	O(n)	Traverse in sorted order
Preorder	O(n)	O(n)	Traverse root before children
Postorder	O(n)	O(n)	Traverse children before root

Space Complexity

Aspect	Complexity	Notes
Storage	O(n)	n nodes plus pointer overhead
Recursion Stack	O(h)	h is tree height
Balanced Height	O(log n)	Height proportional to log n
Degenerate Height	O(n)	Height equals node count

Node Structure

Component	Type	Purpose
value	Any comparable	Stored data element
left	Node reference	Pointer to left child
right	Node reference	Pointer to right child
parent	Node reference	Optional upward pointer

Traversal Orders

Traversal	Visit Order	Output Characteristic
Inorder	Left, Node, Right	Sorted ascending sequence
Reverse Inorder	Right, Node, Left	Sorted descending sequence
Preorder	Node, Left, Right	Root before descendants
Postorder	Left, Right, Node	Children before parent
Level-order	By depth level	Breadth-first processing

Deletion Cases

Case	Condition	Action
Leaf Node	No children	Remove directly
One Child	Single child present	Replace with child
Two Children	Both children exist	Replace with successor, delete successor

Common Methods

Method	Returns	Purpose
insert(value)	node	Add value maintaining order
search(value)	node or nil	Find node with value
delete(value)	node	Remove node with value
min	value	Smallest value in tree
max	value	Largest value in tree
height	integer	Maximum depth from root
size	integer	Total node count
empty?	boolean	Check if tree is empty

Balance Metrics

Metric	Formula	Interpretation
Height	max(left_height, right_height) + 1	Longest path to leaf
Balance Factor	left_height - right_height	Imbalance measure
Perfectly Balanced	height = log₂(n)	Ideal case
Degenerate	height = n	Worst case

Binary Search Trees