Overview
Merge Sort divides an unsorted array into smaller subarrays until each subarray contains a single element, then merges these subarrays back together in sorted order. This approach guarantees consistent performance regardless of input distribution.
The algorithm operates through two distinct phases: the division phase recursively splits the array into halves until reaching base cases of single elements, and the merge phase combines these sorted subarrays back together. During merging, the algorithm compares elements from two sorted subarrays and places them in correct order within a temporary array.
Merge Sort appears in production systems where predictable performance matters more than minimizing memory usage. Database systems use Merge Sort variants for external sorting when data exceeds available memory. Version control systems apply merge algorithms based on similar principles for combining branches. The algorithm's stability property makes it appropriate for multi-level sorting operations where the original order of equal elements must be preserved.
The algorithm requires additional memory proportional to the input size, distinguishing it from in-place sorting algorithms. This memory requirement represents the primary trade-off: guaranteed O(n log n) time complexity in exchange for O(n) space complexity.
def merge_sort(array)
return array if array.length <= 1
mid = array.length / 2
left = merge_sort(array[0...mid])
right = merge_sort(array[mid..-1])
merge(left, right)
end
def merge(left, right)
result = []
until left.empty? || right.empty?
result << (left.first <= right.first ? left.shift : right.shift)
end
result + left + right
end
# Example usage
array = [38, 27, 43, 3, 9, 82, 10]
sorted = merge_sort(array)
# => [3, 9, 10, 27, 38, 43, 82]
Key Principles
Merge Sort implements the divide-and-conquer paradigm through three fundamental operations: divide, conquer, and combine. The divide operation splits the problem into smaller subproblems, the conquer operation solves each subproblem recursively, and the combine operation merges solutions back together.
The division phase recursively partitions the array at its midpoint. Each recursive call receives a subarray half the size of its parent's input. This continues until reaching subarrays containing zero or one element, which are inherently sorted and serve as base cases for the recursion.
The merge phase performs the actual sorting work. The merge operation receives two sorted arrays and produces a single sorted array containing all elements from both inputs. The algorithm maintains two pointers, one for each input array, comparing elements at the current positions and advancing the pointer whose element was selected. This process continues until one array is exhausted, at which point the remaining elements from the other array are appended to the result.
Stability represents a critical property of Merge Sort. When two elements compare as equal, the algorithm preserves their original relative order. The merge operation achieves stability by selecting from the left subarray when elements are equal, maintaining the left-to-right ordering from the original array.
The recursion depth equals the number of times the array can be divided in half, which is log₂(n) for an array of size n. At each level of recursion, the algorithm performs O(n) work merging subarrays. The total time complexity results from multiplying the number of levels by the work per level: O(n) × O(log n) = O(n log n).
The memory footprint consists of two components: the call stack and merge buffers. The call stack requires O(log n) space for the recursion depth. The merge operation requires O(n) space for temporary arrays holding merged results. Different implementation strategies can modify these space requirements while maintaining the core algorithm structure.
Comparison semantics determine sorting order. The merge operation relies on a comparison function that defines the ordering relationship between elements. Ruby's spaceship operator (<=>) provides three-way comparison, returning -1, 0, or 1 to indicate less than, equal to, or greater than relationships.
The algorithm's deterministic nature means identical inputs always produce identical execution paths. The number of comparisons and array accesses depends only on array size, not on the initial ordering of elements. This predictability distinguishes Merge Sort from adaptive algorithms that optimize for partially sorted inputs.
Ruby Implementation
Ruby provides multiple approaches for implementing Merge Sort, each with different trade-offs between clarity, performance, and memory usage. The standard implementation follows the classical recursive structure with explicit merge logic.
class MergeSort
def self.sort(array, &comparator)
comparator ||= ->(a, b) { a <=> b }
merge_sort_recursive(array.dup, comparator)
end
private
def self.merge_sort_recursive(array, comparator)
return array if array.length <= 1
mid = array.length / 2
left = merge_sort_recursive(array[0...mid], comparator)
right = merge_sort_recursive(array[mid..-1], comparator)
merge(left, right, comparator)
end
def self.merge(left, right, comparator)
result = []
left_idx = 0
right_idx = 0
while left_idx < left.length && right_idx < right.length
if comparator.call(left[left_idx], right[right_idx]) <= 0
result << left[left_idx]
left_idx += 1
else
result << right[right_idx]
right_idx += 1
end
end
result.concat(left[left_idx..-1]) if left_idx < left.length
result.concat(right[right_idx..-1]) if right_idx < right.length
result
end
end
# Usage with default comparison
numbers = [64, 34, 25, 12, 22, 11, 90]
sorted = MergeSort.sort(numbers)
# => [11, 12, 22, 25, 34, 64, 90]
# Custom comparison for descending order
descending = MergeSort.sort(numbers) { |a, b| b <=> a }
# => [90, 64, 34, 25, 22, 12, 11]
The in-place variant reduces memory allocation by reusing a single temporary buffer across all merge operations. This approach requires passing buffer and index boundaries through the recursion:
class InPlaceMergeSort
def self.sort(array)
return array if array.length <= 1
temp = Array.new(array.length)
merge_sort_helper(array, temp, 0, array.length - 1)
array
end
private
def self.merge_sort_helper(array, temp, left, right)
return if left >= right
mid = (left + right) / 2
merge_sort_helper(array, temp, left, mid)
merge_sort_helper(array, temp, mid + 1, right)
merge_inplace(array, temp, left, mid, right)
end
def self.merge_inplace(array, temp, left, mid, right)
left_idx = left
right_idx = mid + 1
temp_idx = left
while left_idx <= mid && right_idx <= right
if array[left_idx] <= array[right_idx]
temp[temp_idx] = array[left_idx]
left_idx += 1
else
temp[temp_idx] = array[right_idx]
right_idx += 1
end
temp_idx += 1
end
while left_idx <= mid
temp[temp_idx] = array[left_idx]
left_idx += 1
temp_idx += 1
end
while right_idx <= right
temp[temp_idx] = array[right_idx]
right_idx += 1
temp_idx += 1
end
(left..right).each { |i| array[i] = temp[i] }
end
end
Sorting custom objects requires defining comparison logic. Ruby objects can implement the <=> operator or provide comparison blocks:
class Person
include Comparable
attr_reader :name, :age
def initialize(name, age)
@name = name
@age = age
end
def <=>(other)
age <=> other.age
end
def to_s
"#{name}(#{age})"
end
end
people = [
Person.new("Alice", 30),
Person.new("Bob", 25),
Person.new("Carol", 35),
Person.new("Dave", 25)
]
# Sort by age (default comparison)
by_age = MergeSort.sort(people)
# => [Bob(25), Dave(25), Alice(30), Carol(35)]
# Sort by name using custom comparator
by_name = MergeSort.sort(people) { |a, b| a.name <=> b.name }
# => [Alice(30), Bob(25), Carol(35), Dave(25)]
The iterative implementation eliminates recursion overhead by processing subarrays bottom-up. This approach merges adjacent pairs at each level before proceeding to larger groupings:
class IterativeMergeSort
def self.sort(array)
return array if array.length <= 1
result = array.dup
temp = Array.new(array.length)
size = 1
while size < array.length
left = 0
while left < array.length
mid = [left + size - 1, array.length - 1].min
right = [left + 2 * size - 1, array.length - 1].min
merge_iterative(result, temp, left, mid, right) if mid < right
left += 2 * size
end
size *= 2
end
result
end
private
def self.merge_iterative(array, temp, left, mid, right)
i = left
j = mid + 1
k = left
while i <= mid && j <= right
if array[i] <= array[j]
temp[k] = array[i]
i += 1
else
temp[k] = array[j]
j += 1
end
k += 1
end
while i <= mid
temp[k] = array[i]
i += 1
k += 1
end
while j <= right
temp[k] = array[j]
j += 1
k += 1
end
(left..right).each { |idx| array[idx] = temp[idx] }
end
end
Ruby's Array class provides sort and sort_by methods implemented in C for performance, but understanding Merge Sort implementation remains valuable for custom sorting requirements and educational purposes.
Performance Considerations
Merge Sort exhibits O(n log n) time complexity for all input cases: best, average, and worst. This consistency distinguishes it from algorithms like Quicksort, which degrades to O(n²) on certain inputs. The predictable performance makes Merge Sort appropriate for systems requiring guaranteed response times.
The time complexity analysis breaks down into two components. The division phase splits the array log₂(n) times, creating a recursion tree with log₂(n) levels. At each level, the merge operations collectively process every element exactly once, requiring O(n) work. The total complexity equals the number of levels multiplied by work per level: O(n) × O(log n) = O(n log n).
Space complexity requires O(n) auxiliary memory for merge buffers plus O(log n) for the call stack. The auxiliary space holds temporary arrays during merging. Each merge operation needs space equal to the combined size of the subarrays being merged. At the top level of recursion, this equals the entire array size.
The comparison count ranges from n/2 × log₂(n) to n × log₂(n) comparisons. The minimum occurs when one subarray exhausts before the other at each merge, requiring fewer comparisons. The maximum occurs when arrays interleave completely, requiring a comparison for each element placement.
Memory allocation patterns significantly impact performance in garbage-collected languages like Ruby. Creating new arrays for each merge operation generates substantial garbage. The in-place variant reduces allocations by reusing a single temporary buffer, decreasing garbage collection pressure:
# Allocation-heavy approach
def merge_allocating(left, right)
result = []
# Creates new array on each merge
until left.empty? || right.empty?
result << (left.first <= right.first ? left.shift : right.shift)
end
result + left + right
end
# Reduced allocation approach
def merge_efficient(array, temp, left, mid, right)
# Reuses preallocated temp buffer
# No intermediate array creation
# Modifies array in place after merge
end
Cache performance affects practical runtime. Merge Sort exhibits good spatial locality during the merge phase because it accesses arrays sequentially. However, the recursive structure can degrade cache performance when subarrays exceed cache size. The iterative implementation provides better cache behavior by processing data in a more predictable pattern.
Small array optimization improves performance by switching to insertion sort for small subarrays. When subarrays reach a threshold size (typically 10-20 elements), the overhead of recursion and merging exceeds the benefit of divide-and-conquer. Insertion sort performs well on small arrays due to lower constant factors:
def merge_sort_optimized(array)
return insertion_sort(array) if array.length <= 15
mid = array.length / 2
left = merge_sort_optimized(array[0...mid])
right = merge_sort_optimized(array[mid..-1])
merge(left, right)
end
def insertion_sort(array)
(1...array.length).each do |i|
key = array[i]
j = i - 1
while j >= 0 && array[j] > key
array[j + 1] = array[j]
j -= 1
end
array[j + 1] = key
end
array
end
Parallel Merge Sort exploits multiple cores by sorting subarrays concurrently. Ruby's threading model limits parallelization benefits due to the Global Interpreter Lock (GIL), but external implementations or Ractor-based approaches can achieve parallelism:
require 'concurrent'
def parallel_merge_sort(array, threshold = 1000)
return merge_sort(array) if array.length < threshold
mid = array.length / 2
left_future = Concurrent::Future.execute { parallel_merge_sort(array[0...mid], threshold) }
right_future = Concurrent::Future.execute { parallel_merge_sort(array[mid..-1], threshold) }
left = left_future.value
right = right_future.value
merge(left, right)
end
Comparison operation cost dominates runtime when sorting complex objects. Expensive comparison functions multiply the O(n log n) comparison count by the per-comparison overhead. Caching comparison keys reduces repeated computation:
# Expensive comparison repeated many times
items.sort { |a, b| expensive_key(a) <=> expensive_key(b) }
# Compute keys once using Schwartzian Transform
items.map { |item| [expensive_key(item), item] }
.sort_by(&:first)
.map(&:last)
The external sorting variant handles datasets exceeding memory by sorting chunks that fit in memory, writing them to disk, then merging the sorted chunks. This approach scales to arbitrarily large datasets while maintaining O(n log n) complexity with respect to data size.
Design Considerations
Merge Sort competes with other O(n log n) algorithms, primarily Quicksort and Heapsort. Each algorithm offers different trade-offs between time complexity guarantees, space requirements, and practical performance characteristics.
Merge Sort vs Quicksort represents the most common comparison. Quicksort typically runs faster in practice due to better cache performance and lower constant factors, despite its O(n²) worst case. Merge Sort guarantees O(n log n) in all cases and maintains stability. Choose Merge Sort when:
- Predictable performance matters more than average-case speed
- Stability is required for the sorting operation
- The dataset is linked-list-based rather than array-based
- External sorting is needed for datasets exceeding memory
Choose Quicksort when:
- Average-case performance is the priority
- In-place sorting minimizes memory usage requirements
- The implementation can use randomization to avoid worst cases
- Cache performance significantly impacts overall system speed
# Quicksort excels with random access structures
array = (1..1_000_000).to_a.shuffle
# Quicksort: ~150ms, Merge Sort: ~200ms
# Merge Sort better for linked structures
list = LinkedList.from_array((1..1_000_000).to_a.shuffle)
# Merge Sort: ~180ms, Quicksort: ~300ms
Stability requirements dictate algorithm selection when sorting maintains multiple sort keys. Databases frequently perform multi-column sorts by sorting less significant columns first, then more significant columns, relying on stability to preserve prior ordering:
# Sort employees by department, then by hire date within department
employees.sort_by! { |e| e.hire_date } # Must be stable
employees.sort_by! { |e| e.department } # Preserves hire_date order
# Merge Sort maintains this property automatically
# Unstable algorithms like Quicksort or Heapsort do not
Memory constraints favor in-place algorithms when heap memory is limited. Embedded systems, memory-constrained environments, or very large datasets may prohibit the O(n) auxiliary space requirement. Heapsort provides O(n log n) guaranteed performance with O(1) space. In-place Merge Sort variants exist but increase complexity significantly.
Dataset characteristics influence algorithm selection. Nearly sorted data benefits from adaptive algorithms like Timsort, which combines Merge Sort and insertion sort while detecting sorted runs. Completely random data sees less benefit from adaptivity. Merge Sort maintains consistent performance regardless of initial ordering.
Linked list sorting strongly favors Merge Sort over array-based algorithms. Merge Sort requires only pointer manipulation during merging, avoiding the expensive element shifting of array-based sorts:
class Node
attr_accessor :value, :next
def initialize(value)
@value = value
@next = nil
end
end
def merge_sort_list(head)
return head if head.nil? || head.next.nil?
# Split list into halves
slow = head
fast = head.next
while fast && fast.next
slow = slow.next
fast = fast.next.next
end
mid = slow.next
slow.next = nil
left = merge_sort_list(head)
right = merge_sort_list(mid)
merge_lists(left, right)
end
def merge_lists(left, right)
dummy = Node.new(0)
current = dummy
while left && right
if left.value <= right.value
current.next = left
left = left.next
else
current.next = right
right = right.next
end
current = current.next
end
current.next = left || right
dummy.next
end
Parallel processing opportunities differ between algorithms. Merge Sort naturally parallelizes by sorting subarrays independently. Quicksort also parallelizes well, while Heapsort's sequential nature limits parallel optimization.
Implementation complexity affects maintainability and correctness. Merge Sort implements straightforwardly with clear separation between divide and merge phases. Quicksort requires careful pivot selection and partitioning logic. Heapsort needs heap maintenance operations. Simpler implementations reduce bug introduction risk.
The natural Merge Sort variant optimizes for partially sorted input by identifying existing sorted runs rather than dividing mechanically. This approach bridges the gap between standard Merge Sort and adaptive algorithms:
def natural_merge_sort(array)
runs = identify_runs(array)
while runs.length > 1
merged_runs = []
runs.each_slice(2) do |left, right|
if right
merged_runs << merge_runs(array, left, right)
else
merged_runs << left
end
end
runs = merged_runs
end
array
end
def identify_runs(array)
runs = []
start = 0
while start < array.length
finish = start
finish += 1 while finish < array.length - 1 && array[finish] <= array[finish + 1]
runs << (start..finish)
start = finish + 1
end
runs
end
Practical Examples
Sorting file records demonstrates Merge Sort's stability property. When processing log files or CSV data with multiple sort keys, stability preserves secondary ordering:
class LogEntry
attr_reader :timestamp, :severity, :message
def initialize(timestamp, severity, message)
@timestamp = timestamp
@severity = severity
@message = message
end
def to_s
"[#{timestamp}] #{severity}: #{message}"
end
end
logs = [
LogEntry.new("2025-01-15 10:30:00", "ERROR", "Connection timeout"),
LogEntry.new("2025-01-15 10:30:05", "WARNING", "Slow query"),
LogEntry.new("2025-01-15 10:30:02", "ERROR", "Invalid token"),
LogEntry.new("2025-01-15 10:30:08", "INFO", "Request completed"),
LogEntry.new("2025-01-15 10:30:03", "ERROR", "Database unavailable")
]
# First sort by timestamp to establish chronological order
sorted_by_time = MergeSort.sort(logs) { |a, b| a.timestamp <=> b.timestamp }
# Then sort by severity (ERROR > WARNING > INFO)
severity_order = { "ERROR" => 3, "WARNING" => 2, "INFO" => 1 }
sorted_by_severity = MergeSort.sort(sorted_by_time) do |a, b|
severity_order[b.severity] <=> severity_order[a.severity]
end
# Result: All ERRORs appear first, chronologically ordered
# Then WARNINGs chronologically ordered
# Then INFOs chronologically ordered
Processing large datasets in chunks demonstrates external sorting. This approach handles files larger than available memory:
class ExternalMergeSort
CHUNK_SIZE = 10_000
def self.sort_file(input_path, output_path)
chunk_files = create_sorted_chunks(input_path)
merge_chunks(chunk_files, output_path)
cleanup_chunks(chunk_files)
end
private
def self.create_sorted_chunks(input_path)
chunk_files = []
chunk_index = 0
File.open(input_path, 'r').each_slice(CHUNK_SIZE) do |chunk|
sorted_chunk = MergeSort.sort(chunk.map(&:to_i))
chunk_file = "chunk_#{chunk_index}.tmp"
File.open(chunk_file, 'w') do |f|
sorted_chunk.each { |value| f.puts(value) }
end
chunk_files << chunk_file
chunk_index += 1
end
chunk_files
end
def self.merge_chunks(chunk_files, output_path)
readers = chunk_files.map { |f| File.open(f, 'r') }
current_values = readers.map { |r| r.gets&.to_i }
File.open(output_path, 'w') do |output|
until current_values.all?(&:nil?)
# Find minimum value among current positions
min_idx = current_values.each_with_index
.reject { |val, _| val.nil? }
.min_by { |val, _| val }
&.last
break unless min_idx
output.puts(current_values[min_idx])
current_values[min_idx] = readers[min_idx].gets&.to_i
end
end
readers.each(&:close)
end
def self.cleanup_chunks(chunk_files)
chunk_files.each { |f| File.delete(f) if File.exist?(f) }
end
end
# Usage
ExternalMergeSort.sort_file('large_unsorted.txt', 'sorted_output.txt')
Merging sorted sequences appears frequently in data processing pipelines. Database query engines merge sorted index scans, and log aggregation systems merge time-ordered streams:
class SortedStreamMerger
def initialize(*streams)
@streams = streams.map { |s| s.each }
@current_values = @streams.map { |s| peek(s) }
end
def each
return enum_for(:each) unless block_given?
until @current_values.all?(&:nil?)
min_idx = @current_values.each_with_index
.reject { |val, _| val.nil? }
.min_by { |val, _| val }
&.last
break unless min_idx
yield @current_values[min_idx]
advance_stream(min_idx)
end
end
private
def peek(enumerator)
enumerator.next
rescue StopIteration
nil
end
def advance_stream(index)
@current_values[index] = peek(@streams[index])
end
end
# Merge multiple sorted log files
stream1 = [1, 5, 9, 13].each
stream2 = [2, 6, 10, 14].each
stream3 = [3, 7, 11, 15].each
merger = SortedStreamMerger.new(stream1, stream2, stream3)
merged_result = merger.to_a
# => [1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15]
Common Patterns
The recursive divide-and-conquer pattern forms Merge Sort's foundation. This pattern applies to numerous algorithmic problems beyond sorting:
def merge_sort_pattern(problem)
# Base case: problem is trivial to solve
return trivial_solution(problem) if problem.trivial?
# Divide: split problem into subproblems
subproblems = divide(problem)
# Conquer: solve each subproblem recursively
subsolutions = subproblems.map { |subproblem| merge_sort_pattern(subproblem) }
# Combine: merge subsolutions into final solution
combine(subsolutions)
end
The sentinel value pattern simplifies merge logic by eliminating boundary checks. Appending infinity values to subarrays allows the merge loop to run without checking for exhausted arrays:
def merge_with_sentinels(left, right)
left = left + [Float::INFINITY]
right = right + [Float::INFINITY]
result = []
i = j = 0
(left.length + right.length - 2).times do
if left[i] <= right[j]
result << left[i]
i += 1
else
result << right[j]
j += 1
end
end
result
end
The three-way merge pattern extends binary merging to three subarrays simultaneously. This reduces the recursion depth from log₂(n) to log₃(n), potentially improving cache performance:
def three_way_merge_sort(array)
return array if array.length <= 1
third = array.length / 3
first = three_way_merge_sort(array[0...third])
second = three_way_merge_sort(array[third...2*third])
third_part = three_way_merge_sort(array[2*third..-1])
three_way_merge(first, second, third_part)
end
def three_way_merge(a, b, c)
result = []
i = j = k = 0
while i < a.length && j < b.length && k < c.length
if a[i] <= b[j] && a[i] <= c[k]
result << a[i]
i += 1
elsif b[j] <= c[k]
result << b[j]
j += 1
else
result << c[k]
k += 1
end
end
# Handle remaining elements from two arrays
result.concat(merge(a[i..-1], b[j..-1]))
result.concat(c[k..-1])
result
end
The buffer reuse pattern minimizes allocations by maintaining a single temporary buffer throughout the sort. This pattern trades code complexity for reduced garbage collection overhead:
class BufferReusingMergeSort
def self.sort(array)
return array if array.length <= 1
working = array.dup
buffer = Array.new(array.length)
sort_range(working, buffer, 0, array.length - 1, true)
working
end
private
def self.sort_range(array, buffer, left, right, use_array_as_source)
return if left >= right
mid = (left + right) / 2
# Recursively sort with alternating source/destination
sort_range(array, buffer, left, mid, !use_array_as_source)
sort_range(array, buffer, mid + 1, right, !use_array_as_source)
# Merge from the array that was just written to
if use_array_as_source
merge_range(array, buffer, left, mid, right)
else
merge_range(buffer, array, left, mid, right)
end
end
def self.merge_range(source, dest, left, mid, right)
i = left
j = mid + 1
k = left
while i <= mid && j <= right
if source[i] <= source[j]
dest[k] = source[i]
i += 1
else
dest[k] = source[j]
j += 1
end
k += 1
end
while i <= mid
dest[k] = source[i]
i += 1
k += 1
end
while j <= right
dest[k] = source[j]
j += 1
k += 1
end
end
end
The natural run detection pattern identifies existing sorted sequences before dividing, reducing unnecessary work on partially sorted input:
def find_runs(array)
return [[0, array.length - 1]] if array.length <= 1
runs = []
run_start = 0
(1...array.length).each do |i|
if array[i] < array[i - 1]
runs << [run_start, i - 1]
run_start = i
end
end
runs << [run_start, array.length - 1]
runs
end
def adaptive_merge_sort(array)
runs = find_runs(array)
while runs.length > 1
new_runs = []
runs.each_slice(2) do |run1, run2|
if run2
merged_run = merge_runs(array, run1, run2)
new_runs << merged_run
else
new_runs << run1
end
end
runs = new_runs
end
array
end
Reference
Time Complexity
| Case | Comparisons | Array Accesses | Overall |
|---|---|---|---|
| Best | n log n | n log n | O(n log n) |
| Average | n log n | n log n | O(n log n) |
| Worst | n log n | n log n | O(n log n) |
Space Complexity
| Component | Space Required | Notes |
|---|---|---|
| Auxiliary Array | O(n) | For merge operations |
| Call Stack | O(log n) | Recursion depth |
| Total | O(n) | Dominated by auxiliary array |
| In-Place Variant | O(log n) | Single buffer reused |
Algorithm Characteristics
| Property | Value | Explanation |
|---|---|---|
| Stable | Yes | Preserves relative order of equal elements |
| Adaptive | No | Same performance regardless of input order |
| Online | No | Requires entire input before starting |
| In-Place | No | Requires additional memory |
| Comparison-Based | Yes | Only uses element comparisons |
| Parallelizable | Yes | Subproblems independent |
Comparison with Other Algorithms
| Algorithm | Time (Avg) | Time (Worst) | Space | Stable | In-Place |
|---|---|---|---|---|---|
| Merge Sort | O(n log n) | O(n log n) | O(n) | Yes | No |
| Quick Sort | O(n log n) | O(n²) | O(log n) | No | Yes |
| Heap Sort | O(n log n) | O(n log n) | O(1) | No | Yes |
| Tim Sort | O(n log n) | O(n log n) | O(n) | Yes | No |
| Insertion Sort | O(n²) | O(n²) | O(1) | Yes | Yes |
Implementation Variants
| Variant | Space | Time | Use Case |
|---|---|---|---|
| Top-Down Recursive | O(n) | O(n log n) | Standard implementation |
| Bottom-Up Iterative | O(n) | O(n log n) | Avoid recursion overhead |
| In-Place | O(log n) | O(n log n) | Memory-constrained environments |
| Natural | O(n) | O(n log n) best | Partially sorted data |
| Three-Way | O(n) | O(n log₃ n) | Better cache locality |
| External | O(k) | O(n log n) | Data exceeds memory |
Common Operations
| Operation | Implementation | Complexity |
|---|---|---|
| Find Middle | mid = (left + right) / 2 | O(1) |
| Merge Two Arrays | Sequential comparison | O(n) |
| Split Array | Create subarrays | O(n) |
| Base Case Check | length <= 1 | O(1) |
Ruby Method Signatures
| Method | Parameters | Returns | Purpose |
|---|---|---|---|
| merge_sort | array | sorted array | Main sorting method |
| merge | left, right | merged array | Combine sorted subarrays |
| merge_sort_recursive | array, left, right | void | Recursive helper |
| merge_inplace | array, temp, left, mid, right | void | In-place merge |
Performance Factors
| Factor | Impact | Mitigation |
|---|---|---|
| Array Size | Linear with merge operations | Use threshold for small arrays |
| Memory Allocation | Garbage collection overhead | Reuse buffers |
| Comparison Cost | Multiplies base complexity | Cache comparison keys |
| Cache Misses | Memory access latency | Use iterative implementation |
| Recursion Depth | Call stack overhead | Switch to iterative for deep recursion |
When to Use Merge Sort
| Scenario | Reason |
|---|---|
| Guaranteed performance needed | O(n log n) worst case |
| Sorting linked lists | No random access penalty |
| Stable sort required | Preserves equal element order |
| External sorting | Efficient with disk I/O |
| Parallel processing | Independent subproblems |
| Merging sorted sequences | Natural merge operation |
When Not to Use Merge Sort
| Scenario | Better Alternative |
|---|---|
| Memory constrained | Quick Sort or Heap Sort |
| In-place requirement | Quick Sort or Heap Sort |
| Small arrays | Insertion Sort |
| Nearly sorted data | Tim Sort or Insertion Sort |
| Average case priority | Quick Sort |