Overview
A heap is a specialized tree-based data structure that satisfies the heap property. The structure represents a complete binary tree where each parent node maintains a specific ordering relationship with its children. Heaps serve as the foundation for priority queues, enable efficient sorting algorithms, and solve problems requiring repeated access to minimum or maximum elements.
The heap property defines two variants. A max-heap maintains parent nodes with values greater than or equal to their children, placing the maximum element at the root. A min-heap reverses this relationship, keeping parent values less than or equal to children, positioning the minimum element at the root. This fundamental ordering enables logarithmic-time operations for insertion and extraction.
Binary heaps use array-based storage rather than explicit node objects with pointers. This representation maps the complete binary tree structure to sequential array indices. For a node at index i, the left child occupies index 2*i + 1, the right child occupies 2*i + 2, and the parent resides at (i-1)/2. This mathematical relationship eliminates pointer overhead and provides cache-friendly memory access patterns.
# Conceptual heap structure as array
# Index: 0 1 2 3 4 5 6
# Array: [10, 7, 9, 4, 3, 5, 8]
# Tree representation:
# 10
# / \
# 7 9
# / \ / \
# 4 3 5 8
Heaps find application in scheduling systems where tasks require priority-based execution, graph algorithms like Dijkstra's shortest path that need efficient minimum extraction, and streaming algorithms that maintain top-k elements from continuous data. The heap structure trades complete ordering for partial ordering, accepting that siblings remain unordered while preserving the critical parent-child relationship. This design choice enables heap operations to achieve logarithmic complexity instead of the linear time required for maintaining fully sorted structures.
Key Principles
The heap property forms the structural invariant that defines heap behavior. Every node in the tree must satisfy the ordering relationship with its children. For a max-heap, the value at each node exceeds or equals all values in its subtree. For a min-heap, each node's value is less than or equal to all descendant values. This property differs from binary search trees where the ordering extends across the entire tree structure. Heaps only enforce vertical ordering along parent-child paths.
Complete binary trees provide the shape constraint that heaps require. The tree fills each level from left to right before adding a new level. All levels except the last remain completely filled. The last level fills from left to right without gaps. This structure guarantees logarithmic height relative to the number of elements, ensuring that operations traversing from root to leaf complete in O(log n) time.
Array representation exploits the complete tree property through mathematical index relationships. Position calculations replace pointer traversals. For zero-indexed arrays, a node at index i connects to its left child at 2*i + 1 and right child at 2*i + 2. The parent of node i sits at (i-1)/2 when using integer division. These formulas enable navigation through the tree structure without storing explicit parent or child references.
class HeapHelpers
def self.parent_index(i)
(i - 1) / 2
end
def self.left_child_index(i)
2 * i + 1
end
def self.right_child_index(i)
2 * i + 2
end
end
# Example navigation
index = 4
parent = HeapHelpers.parent_index(index) # => 1
left = HeapHelpers.left_child_index(index) # => 9
right = HeapHelpers.right_child_index(index) # => 10
Heapification refers to the process of establishing or restoring the heap property. Two directional operations handle this task. Sift-up (bubble-up) moves an element toward the root by comparing it with its parent and swapping when the heap property fails. This operation handles insertion, where new elements enter at the tree bottom and rise to their correct position. Sift-down (bubble-down) moves an element toward the leaves by comparing it with its children and swapping with the appropriate child when needed. Deletion of the root element triggers sift-down after replacing the root with the last element.
The heap invariant states that at any moment during operations, the heap property holds for all nodes except those currently involved in heapification. Operations maintain this invariant by propagating changes up or down the tree until the property holds everywhere. Insertion temporarily violates the property at the new leaf position, then sift-up repairs the path to the root. Extraction replaces the root with the last leaf, violating the property at the root, then sift-down repairs the path downward.
Comparison semantics determine heap ordering behavior. The heap requires a consistent comparison function that defines the priority relationship. For numeric heaps, this function compares values directly. For object heaps, custom comparators extract the priority key or apply domain-specific ordering rules. Reversing the comparison function converts a min-heap implementation into a max-heap without changing the algorithmic structure.
Ruby Implementation
Ruby does not include a heap implementation in its standard library. The language provides the foundation for building efficient heap structures through arrays and comparison operators. A practical implementation encapsulates the array representation and exposes operations through a clean interface.
class MinHeap
def initialize
@heap = []
end
def insert(value)
@heap << value
sift_up(@heap.size - 1)
end
def extract_min
return nil if @heap.empty?
min = @heap[0]
@heap[0] = @heap.pop
sift_down(0) unless @heap.empty?
min
end
def peek
@heap[0]
end
def size
@heap.size
end
def empty?
@heap.empty?
end
private
def sift_up(index)
return if index == 0
parent_idx = (index - 1) / 2
if @heap[index] < @heap[parent_idx]
@heap[index], @heap[parent_idx] = @heap[parent_idx], @heap[index]
sift_up(parent_idx)
end
end
def sift_down(index)
left_idx = 2 * index + 1
right_idx = 2 * index + 2
smallest = index
if left_idx < @heap.size && @heap[left_idx] < @heap[smallest]
smallest = left_idx
end
if right_idx < @heap.size && @heap[right_idx] < @heap[smallest]
smallest = right_idx
end
if smallest != index
@heap[index], @heap[smallest] = @heap[smallest], @heap[index]
sift_down(smallest)
end
end
end
# Usage
heap = MinHeap.new
[5, 3, 7, 1, 9, 4].each { |n| heap.insert(n) }
heap.extract_min # => 1
heap.extract_min # => 3
heap.peek # => 4
A max-heap implementation follows identical structure with reversed comparisons. Creating a base class with a comparison strategy parameter eliminates code duplication.
class Heap
def initialize(&comparator)
@heap = []
@comparator = comparator || ->(a, b) { a <=> b }
end
def insert(value)
@heap << value
sift_up(@heap.size - 1)
end
def extract
return nil if @heap.empty?
result = @heap[0]
@heap[0] = @heap.pop
sift_down(0) unless @heap.empty?
result
end
def peek
@heap[0]
end
def size
@heap.size
end
private
def sift_up(index)
return if index == 0
parent_idx = (index - 1) / 2
if compare(@heap[index], @heap[parent_idx]) < 0
@heap[index], @heap[parent_idx] = @heap[parent_idx], @heap[index]
sift_up(parent_idx)
end
end
def sift_down(index)
left_idx = 2 * index + 1
right_idx = 2 * index + 2
priority = index
if left_idx < @heap.size && compare(@heap[left_idx], @heap[priority]) < 0
priority = left_idx
end
if right_idx < @heap.size && compare(@heap[right_idx], @heap[priority]) < 0
priority = right_idx
end
if priority != index
@heap[index], @heap[priority] = @heap[priority], @heap[index]
sift_down(priority)
end
end
def compare(a, b)
@comparator.call(a, b)
end
end
# Min-heap (default)
min_heap = Heap.new
# Max-heap (reversed comparison)
max_heap = Heap.new { |a, b| b <=> a }
# Custom object heap
Task = Struct.new(:name, :priority)
task_heap = Heap.new { |a, b| a.priority <=> b.priority }
task_heap.insert(Task.new("Debug", 1))
task_heap.insert(Task.new("Deploy", 3))
task_heap.insert(Task.new("Review", 2))
task_heap.extract # => Task("Debug", 1)
Building a heap from an existing array uses the heapify algorithm rather than repeated insertions. This approach achieves O(n) time complexity instead of O(n log n) by working from the bottom up.
class Heap
def self.heapify(array, &comparator)
heap = new(&comparator)
heap.instance_variable_set(:@heap, array.dup)
# Start from last parent node and sift down
start_idx = (array.size / 2) - 1
start_idx.downto(0) do |i|
heap.send(:sift_down, i)
end
heap
end
end
# Build heap in linear time
array = [9, 4, 7, 1, 3, 8, 5]
heap = Heap.heapify(array)
heap.extract # => 1
Iterative implementations avoid recursion overhead for deep heaps. The iterative sift operations use loops instead of recursive calls.
def sift_up_iterative(index)
while index > 0
parent_idx = (index - 1) / 2
break if compare(@heap[parent_idx], @heap[index]) <= 0
@heap[index], @heap[parent_idx] = @heap[parent_idx], @heap[index]
index = parent_idx
end
end
def sift_down_iterative(index)
loop do
left_idx = 2 * index + 1
right_idx = 2 * index + 2
priority = index
if left_idx < @heap.size && compare(@heap[left_idx], @heap[priority]) < 0
priority = left_idx
end
if right_idx < @heap.size && compare(@heap[right_idx], @heap[priority]) < 0
priority = right_idx
end
break if priority == index
@heap[index], @heap[priority] = @heap[priority], @heap[index]
index = priority
end
end
Performance Considerations
Heap operations achieve their efficiency through the logarithmic height of complete binary trees. A heap containing n elements maintains height approximately log₂(n), limiting the number of comparisons and swaps during tree traversal.
Insertion performs O(log n) comparisons in the worst case. The new element starts at the bottom level and potentially moves to the root, traversing the full height. Average case insertion requires fewer comparisons since most elements settle before reaching the top. The operation performs constant work at each level: one comparison with the parent and one potential swap.
Extraction also operates in O(log n) time. Removing the root creates a vacancy that fills with the last leaf element. This replacement violates the heap property at the root, triggering sift-down. The element compares with both children at each level, selecting the appropriate child for swapping. The path length from root to leaf bounds the number of operations.
Peek operations run in O(1) time. The heap property guarantees the extreme element always occupies the root position. Accessing array index zero directly returns this value without traversal or comparison.
Heapify transforms an arbitrary array into a valid heap in O(n) time. This linear complexity surprises developers expecting O(n log n) based on n insertions. The algorithm works backward from the last parent node, sifting down each element. Nodes near the bottom require minimal work since they move short distances. The majority of nodes sit in lower levels where sift-down costs remain low. Mathematical analysis shows the total work sums to O(n) despite individual sift-down operations costing O(log n).
def benchmark_operations(size)
heap = Heap.new
# Measure insertion time
insertion_start = Time.now
size.times { |i| heap.insert(rand(1000)) }
insertion_time = Time.now - insertion_start
# Measure extraction time
extraction_start = Time.now
size.times { heap.extract }
extraction_time = Time.now - extraction_start
# Measure heapify time
array = Array.new(size) { rand(1000) }
heapify_start = Time.now
Heap.heapify(array)
heapify_time = Time.now - heapify_start
{
insertion: insertion_time,
extraction: extraction_time,
heapify: heapify_time
}
end
Space complexity remains O(n) for storing n elements. The array representation requires no additional pointer overhead beyond the element storage. Heap operations use O(1) auxiliary space with iterative implementations. Recursive implementations consume O(log n) stack space proportional to tree height, but this remains negligible compared to element storage.
Cache efficiency benefits from array representation. Sequential array access exhibits better locality than pointer-based tree structures. Parent and child nodes often reside in nearby array positions, increasing cache hit rates during traversal operations. This advantage grows more significant for large heaps where pointer-chasing creates cache misses.
Comparison cost dominates performance for complex element types. Heaps with expensive comparison operations pay this cost at each tree level. Strategies include caching computed comparison keys or using simpler proxy values for priority. For objects with multiple fields, extracting a numeric priority once during insertion avoids repeated field access.
Practical Examples
Priority queues represent the most common heap application. Tasks, events, or requests require processing in priority order rather than arrival order. The heap structure provides efficient access to the highest priority item.
class PriorityQueue
Task = Struct.new(:description, :priority, :deadline) do
def to_s
"#{description} (priority: #{priority})"
end
end
def initialize
@heap = Heap.new { |a, b| a.priority <=> b.priority }
end
def enqueue(description, priority, deadline = nil)
task = Task.new(description, priority, deadline)
@heap.insert(task)
end
def dequeue
@heap.extract
end
def peek_next
@heap.peek
end
def empty?
@heap.size == 0
end
end
# Process tasks by priority
queue = PriorityQueue.new
queue.enqueue("Fix critical bug", 1)
queue.enqueue("Write documentation", 3)
queue.enqueue("Code review", 2)
queue.enqueue("Deploy hotfix", 1)
while !queue.empty?
task = queue.dequeue
puts "Processing: #{task}"
end
# Output:
# Processing: Fix critical bug (priority: 1)
# Processing: Deploy hotfix (priority: 1)
# Processing: Code review (priority: 2)
# Processing: Write documentation (priority: 3)
Heap sort leverages the heap property to sort arrays in O(n log n) time. Building a max-heap from the array places the largest element at the root. Repeatedly extracting the maximum and placing it at the end produces sorted order.
def heap_sort(array)
return array if array.size <= 1
# Build max-heap in-place
heap = Heap.heapify(array) { |a, b| b <=> a }
# Extract elements in sorted order
sorted = []
array.size.times do
sorted << heap.extract
end
sorted
end
numbers = [64, 34, 25, 12, 22, 11, 90]
heap_sort(numbers) # => [90, 64, 34, 25, 22, 12, 11]
Finding k largest elements from a stream uses a min-heap of size k. Each incoming element compares against the heap minimum. Elements larger than the minimum replace it, maintaining the k largest values.
def k_largest_elements(stream, k)
return [] if k <= 0
heap = MinHeap.new
stream.each do |value|
if heap.size < k
heap.insert(value)
elsif value > heap.peek
heap.extract_min
heap.insert(value)
end
end
result = []
k.times { result << heap.extract_min }
result.reverse
end
stream = [3, 7, 2, 9, 1, 8, 4, 6, 5]
k_largest_elements(stream, 3) # => [9, 8, 7]
Merging k sorted arrays efficiently uses a min-heap containing the current element from each array. Extract the minimum element, then insert the next element from that array. This approach maintains O(n log k) complexity where n represents total elements.
def merge_k_sorted_arrays(arrays)
Element = Struct.new(:value, :array_index, :element_index)
heap = Heap.new { |a, b| a.value <=> b.value }
result = []
# Initialize heap with first element from each array
arrays.each_with_index do |arr, array_idx|
next if arr.empty?
heap.insert(Element.new(arr[0], array_idx, 0))
end
# Extract minimum and add next from same array
while heap.size > 0
element = heap.extract
result << element.value
array = arrays[element.array_index]
next_idx = element.element_index + 1
if next_idx < array.size
heap.insert(Element.new(array[next_idx], element.array_index, next_idx))
end
end
result
end
arrays = [
[1, 4, 7],
[2, 5, 8],
[3, 6, 9]
]
merge_k_sorted_arrays(arrays) # => [1, 2, 3, 4, 5, 6, 7, 8, 9]
Common Patterns
Building heaps from existing data occurs frequently enough to warrant optimization. The heapify algorithm processes nodes from bottom to top, requiring fewer operations than sequential insertion.
def build_heap_efficient(array)
# Process non-leaf nodes from last to first
last_parent = (array.size / 2) - 1
last_parent.downto(0) do |i|
heapify_down(array, i, array.size)
end
array
end
def heapify_down(array, index, heap_size)
loop do
left = 2 * index + 1
right = 2 * index + 2
largest = index
largest = left if left < heap_size && array[left] > array[largest]
largest = right if right < heap_size && array[right] > array[largest]
break if largest == index
array[index], array[largest] = array[largest], array[index]
index = largest
end
end
Top-k problems appear in recommendation systems, analytics, and monitoring. Maintaining a fixed-size heap of k elements handles streaming data where storing all values proves impractical.
class TopKTracker
def initialize(k)
@k = k
@heap = MinHeap.new
end
def add(value)
if @heap.size < @k
@heap.insert(value)
elsif value > @heap.peek
@heap.extract_min
@heap.insert(value)
end
end
def top_k
result = []
temp = []
# Extract all elements
@heap.size.times do
value = @heap.extract_min
result << value
temp << value
end
# Restore heap
temp.each { |v| @heap.insert(v) }
result.reverse
end
end
tracker = TopKTracker.new(3)
[5, 2, 8, 1, 9, 3, 7].each { |n| tracker.add(n) }
tracker.top_k # => [9, 8, 7]
Median maintenance requires two heaps: a max-heap for smaller half of elements and a min-heap for larger half. The median sits at the root of one heap or averages both roots.
class MedianTracker
def initialize
@lower = Heap.new { |a, b| b <=> a } # max-heap
@upper = Heap.new # min-heap
end
def add(value)
# Add to appropriate heap
if @lower.size == 0 || value <= @lower.peek
@lower.insert(value)
else
@upper.insert(value)
end
# Rebalance heaps
if @lower.size > @upper.size + 1
@upper.insert(@lower.extract)
elsif @upper.size > @lower.size
@lower.insert(@upper.extract)
end
end
def median
return nil if @lower.size == 0
if @lower.size > @upper.size
@lower.peek
else
(@lower.peek + @upper.peek) / 2.0
end
end
end
tracker = MedianTracker.new
[5, 15, 1, 3, 8, 7].each do |n|
tracker.add(n)
puts "After adding #{n}: median = #{tracker.median}"
end
# Output:
# After adding 5: median = 5
# After adding 15: median = 10.0
# After adding 1: median = 5
# After adding 3: median = 4.0
# After adding 8: median = 5
# After adding 7: median = 6.0
Dijkstra's shortest path algorithm uses a min-heap to efficiently select the next vertex with minimum distance. This heap-based approach outperforms linear scanning for sparse graphs.
def dijkstra(graph, start)
distances = Hash.new(Float::INFINITY)
distances[start] = 0
Node = Struct.new(:vertex, :distance)
heap = Heap.new { |a, b| a.distance <=> b.distance }
heap.insert(Node.new(start, 0))
visited = Set.new
while heap.size > 0
current = heap.extract
next if visited.include?(current.vertex)
visited.add(current.vertex)
graph[current.vertex].each do |neighbor, weight|
new_distance = distances[current.vertex] + weight
if new_distance < distances[neighbor]
distances[neighbor] = new_distance
heap.insert(Node.new(neighbor, new_distance))
end
end
end
distances
end
graph = {
'A' => [['B', 4], ['C', 2]],
'B' => [['C', 1], ['D', 5]],
'C' => [['D', 8], ['E', 10]],
'D' => [['E', 2]],
'E' => []
}
dijkstra(graph, 'A') # => {"A"=>0, "C"=>2, "B"=>4, "D"=>9, "E"=>11}
Common Pitfalls
Index calculations cause frequent errors in heap implementations. Off-by-one mistakes in parent or child index formulas break the tree structure. For zero-indexed arrays, the parent of node i sits at (i-1)/2, not i/2. Child indices require 2*i+1 and 2*i+2, not 2*i and 2*i+1.
# WRONG - Incorrect index calculations
def wrong_parent(i)
i / 2 # Off by one for zero-indexed arrays
end
def wrong_left_child(i)
2 * i # Misses the +1 offset
end
# CORRECT
def correct_parent(i)
(i - 1) / 2
end
def correct_left_child(i)
2 * i + 1
end
Comparison inconsistency creates invalid heaps. Changing the comparison logic after insertion breaks the heap property. All comparisons must use identical logic throughout the heap's lifetime.
# WRONG - Inconsistent comparison
heap = Heap.new { |a, b| a <=> b }
heap.insert(5)
heap.insert(3)
# Changing comparison breaks invariants
heap.instance_variable_set(:@comparator, ->(a, b) { b <=> a })
Empty heap operations require explicit handling. Extracting from an empty heap returns nil by convention, but calling peek without checking size causes confusion when nil represents a valid heap element.
# WRONG - Ambiguous nil return
heap = MinHeap.new
heap.insert(nil)
value = heap.peek # nil - was heap empty or does it contain nil?
# BETTER - Check size explicitly
if heap.size > 0
value = heap.peek
else
# Handle empty heap
end
Heap modification during iteration violates structural invariants. Extracting elements while iterating the internal array skips elements and produces incorrect results.
# WRONG - Modifying heap during iteration
heap = MinHeap.new
[5, 3, 7, 1].each { |n| heap.insert(n) }
heap.instance_variable_get(:@heap).each do |value|
heap.extract_min # Modifies array being iterated
end
# CORRECT - Extract until empty
results = []
results << heap.extract_min until heap.empty?
Sift-down direction errors occur when selecting the wrong child for swapping. For min-heaps, sift-down swaps with the smaller child. For max-heaps, swap with the larger child. Swapping with the wrong child violates the heap property.
# WRONG - Swapping with wrong child in min-heap
def wrong_sift_down(index)
left_idx = 2 * index + 1
right_idx = 2 * index + 2
# Always swaps with left child
if left_idx < @heap.size && @heap[left_idx] < @heap[index]
@heap[index], @heap[left_idx] = @heap[left_idx], @heap[index]
end
end
# CORRECT - Find smallest among node and both children
def correct_sift_down(index)
left_idx = 2 * index + 1
right_idx = 2 * index + 2
smallest = index
smallest = left_idx if left_idx < @heap.size && @heap[left_idx] < @heap[smallest]
smallest = right_idx if right_idx < @heap.size && @heap[right_idx] < @heap[smallest]
if smallest != index
@heap[index], @heap[smallest] = @heap[smallest], @heap[index]
sift_down(smallest)
end
end
Bounds checking failures crash heap operations. Accessing array indices without verifying they fall within array bounds triggers exceptions during sift operations.
# WRONG - Missing bounds check
def unsafe_sift_down(index)
left_idx = 2 * index + 1
right_idx = 2 * index + 2
# May access indices beyond array size
if @heap[left_idx] < @heap[index]
@heap[index], @heap[left_idx] = @heap[left_idx], @heap[index]
end
end
# CORRECT - Check bounds before access
def safe_sift_down(index)
left_idx = 2 * index + 1
right_idx = 2 * index + 2
if left_idx < @heap.size && @heap[left_idx] < @heap[index]
@heap[index], @heap[left_idx] = @heap[left_idx], @heap[index]
sift_down(left_idx)
end
end
Assuming sorted order leads to incorrect algorithms. Heaps maintain partial ordering, not complete sorting. Iterating the underlying array does not produce sorted elements.
# WRONG - Assuming array is sorted
heap = MinHeap.new
[5, 3, 7, 1, 9].each { |n| heap.insert(n) }
sorted = heap.instance_variable_get(:@heap) # [1, 3, 7, 5, 9] - NOT sorted
# CORRECT - Extract elements to get sorted order
sorted = []
sorted << heap.extract_min until heap.empty? # [1, 3, 5, 7, 9]
Reference
Index Calculation Formulas
| Formula | Expression | Description |
|---|---|---|
| Parent | (i - 1) / 2 | Parent node index for node at index i |
| Left Child | 2 * i + 1 | Left child index for node at index i |
| Right Child | 2 * i + 2 | Right child index for node at index i |
| Last Parent | (n / 2) - 1 | Index of last non-leaf node in heap of size n |
| Height | floor(log₂(n)) | Tree height for heap with n elements |
Time Complexity
| Operation | Average Case | Worst Case | Notes |
|---|---|---|---|
| Insert | O(log n) | O(log n) | Sift-up traverses tree height |
| Extract | O(log n) | O(log n) | Sift-down traverses tree height |
| Peek | O(1) | O(1) | Direct array access at index 0 |
| Heapify | O(n) | O(n) | Build heap from arbitrary array |
| Search | O(n) | O(n) | Must examine all elements |
| Delete arbitrary | O(n) | O(n) | Find element then remove |
| Size | O(1) | O(1) | Track size as instance variable |
Space Complexity
| Structure | Space | Description |
|---|---|---|
| Heap storage | O(n) | Array holds all n elements |
| Recursive sift | O(log n) | Stack depth equals tree height |
| Iterative sift | O(1) | No additional space beyond variables |
| Heapify | O(1) | In-place construction |
Heap Property Definitions
| Heap Type | Property | Root Value |
|---|---|---|
| Min-Heap | parent ≤ children | Minimum element |
| Max-Heap | parent ≥ children | Maximum element |
| Min-Max Heap | Alternating levels | Min at root, max at second level |
| Leftist Heap | Left path longer | No specific value requirement |
Common Operations
| Operation | Implementation | Use Case |
|---|---|---|
| Insert | Add to end, sift-up | Priority queue enqueue |
| Extract | Remove root, move last to root, sift-down | Priority queue dequeue |
| Peek | Return root without removal | Check next priority item |
| Replace | Extract then insert atomically | Update top priority |
| Heapify | Bottom-up sift-down from last parent | Build heap from array |
| Merge | Combine elements and rebuild | Join priority queues |
Heap Variants
| Variant | Properties | Trade-offs |
|---|---|---|
| Binary Heap | Two children per node | Simple, cache-friendly, O(log n) operations |
| D-ary Heap | d children per node | Shorter height, more comparisons per level |
| Fibonacci Heap | Lazy operations, amortized bounds | Complex, better amortized performance |
| Binomial Heap | Collection of binomial trees | Efficient merge O(log n) |
| Pairing Heap | Simplified Fibonacci heap | Simpler implementation, good practical performance |
| Leftist Heap | Left-heavy structure | Efficient merge, pointer-based |