CrackedRuby - Linear and Binary Search

Overview

Linear and binary search represent two fundamental approaches to locating elements within collections. Linear search examines each element sequentially until finding the target or exhausting the collection. Binary search applies a divide-and-conquer strategy to sorted collections, repeatedly halving the search space until locating the target or determining its absence.

These algorithms form the foundation for data retrieval operations across software systems. Database query optimizers, file systems, in-memory caches, and collection libraries all depend on search algorithms to locate data efficiently. The choice between linear and binary search affects application performance, memory usage, and implementation complexity.

Linear search works on any collection structure, requiring no preprocessing or special ordering. The algorithm starts at the first element and compares each subsequent element against the target value. When a match occurs, the algorithm returns the element's position. If the search reaches the collection's end without finding the target, it indicates absence.

Binary search requires sorted input but delivers logarithmic time complexity. The algorithm compares the target against the middle element. If the target matches, the search succeeds immediately. If the target is smaller, the algorithm discards the upper half and searches the lower half. If larger, it discards the lower half and searches the upper half. This halving process continues until finding the target or reducing the search space to zero.

The fundamental difference in their approaches creates distinct performance profiles. Linear search performs O(n) comparisons in the worst case, where n represents the collection size. Binary search performs O(log n) comparisons, making it exponentially faster for large datasets. However, binary search's requirement for sorted data adds preprocessing costs that may outweigh its benefits for small collections or unsorted data.

Key Principles

Linear search implements sequential access patterns. The algorithm maintains a single pointer or index that advances through the collection one element at a time. Each iteration compares the current element against the target value. The comparison determines whether to return the current position (on match) or continue to the next element (on mismatch). The algorithm terminates when either finding the target or reaching the collection boundary.

The algorithm's simplicity makes it applicable to any iterable data structure. Linked lists, arrays, sets, and streams all support linear search without modification. The algorithm requires no knowledge of the collection's internal organization, accessing elements strictly through iteration protocols.

Binary search depends on the sorted property to eliminate portions of the search space. The algorithm maintains two pointers, left and right, defining the current search boundaries. Each iteration calculates the midpoint between these boundaries and compares the element at that position against the target.

Three outcomes are possible from each comparison. If the middle element equals the target, the search succeeds and returns the index. If the target is less than the middle element, all elements in the upper half exceed the target (due to sorting), so the algorithm updates right = mid - 1 to search only the lower half. If the target is greater than the middle element, all elements in the lower half are too small, so the algorithm updates left = mid + 1 to search only the upper half.

The search space shrinks by half with each iteration. Starting with n elements, the first comparison eliminates approximately n/2 elements. The second eliminates approximately n/4 elements. This geometric reduction continues until the search space contains zero or one element. The maximum number of iterations equals ⌈log₂(n)⌉, the ceiling of the base-2 logarithm of the collection size.

Binary search terminates under two conditions. Success occurs when the middle element equals the target. Failure occurs when left > right, indicating the search space has been exhausted without finding the target. The final position of the pointers provides information about where the target would belong if inserted into the collection.

The sorted requirement restricts binary search's applicability. Collections must maintain ordering according to a consistent comparison function. The comparison must be transitive (if a < b and b < c, then a < c) and consistent (comparing the same elements always yields the same result). Violations of these properties produce undefined behavior, potentially causing infinite loops or incorrect results.

Both algorithms exhibit different behaviors with duplicate elements. Linear search returns the first occurrence when scanning left-to-right. Binary search may return any occurrence, as the algorithm's path through the collection depends on the specific values and positions of elements. Finding the first or last occurrence with binary search requires modifications to the standard algorithm.

Ruby Implementation

Ruby provides built-in methods for both search approaches through the Enumerable and Array classes. The include? and index methods implement linear search, while bsearch and bsearch_index implement binary search variants.

The index method performs linear search, returning the position of the first matching element:

numbers = [15, 3, 42, 7, 23, 8, 19]
position = numbers.index(42)
# => 2

names = ["Alice", "Bob", "Charlie", "Diana"]
names.index("Charlie")
# => 2

# Returns nil when element not found
numbers.index(100)
# => nil

The include? method also uses linear search but returns a boolean:

numbers = [15, 3, 42, 7, 23, 8, 19]
numbers.include?(42)
# => true

numbers.include?(100)
# => false

For binary search, Ruby's bsearch method requires a sorted array and accepts a block that returns a boolean or numeric comparison result:

sorted_numbers = [3, 7, 8, 15, 19, 23, 42]

# Find mode: block returns true/false
result = sorted_numbers.bsearch { |x| x >= 19 }
# => 19

# Finds the first element where block returns true
sorted_numbers.bsearch { |x| x >= 20 }
# => 23

# Returns nil if no element satisfies condition
sorted_numbers.bsearch { |x| x >= 100 }
# => nil

The bsearch_index method returns the index instead of the element:

sorted_numbers = [3, 7, 8, 15, 19, 23, 42]

index = sorted_numbers.bsearch_index { |x| x >= 19 }
# => 4

# Returns nil when no match
sorted_numbers.bsearch_index { |x| x >= 100 }
# => nil

Custom linear search implementation demonstrates the sequential scan approach:

def linear_search(array, target)
  array.each_with_index do |element, index|
    return index if element == target
  end
  nil
end

data = [15, 3, 42, 7, 23, 8, 19]
linear_search(data, 23)
# => 4

linear_search(data, 99)
# => nil

Custom binary search implementation shows the divide-and-conquer strategy:

def binary_search(array, target)
  left = 0
  right = array.length - 1
  
  while left <= right
    mid = (left + right) / 2
    
    if array[mid] == target
      return mid
    elsif array[mid] < target
      left = mid + 1
    else
      right = mid - 1
    end
  end
  
  nil
end

sorted_data = [3, 7, 8, 15, 19, 23, 42]
binary_search(sorted_data, 19)
# => 4

binary_search(sorted_data, 99)
# => nil

Binary search with custom comparison function:

def binary_search_with_comparator(array, target, comparator)
  left = 0
  right = array.length - 1
  
  while left <= right
    mid = (left + right) / 2
    comparison = comparator.call(array[mid], target)
    
    if comparison == 0
      return mid
    elsif comparison < 0
      left = mid + 1
    else
      right = mid - 1
    end
  end
  
  nil
end

# Search case-insensitive strings
names = ["alice", "bob", "charlie", "diana"]
comparator = ->(a, b) { a.downcase <=> b.downcase }

binary_search_with_comparator(names, "CHARLIE", comparator)
# => 2

Finding the insertion point for a target value:

def binary_search_insertion_point(array, target)
  left = 0
  right = array.length - 1
  
  while left <= right
    mid = (left + right) / 2
    
    if array[mid] < target
      left = mid + 1
    else
      right = mid - 1
    end
  end
  
  left
end

sorted_data = [3, 7, 8, 15, 19, 23, 42]

# Where to insert 20 to maintain sorted order
binary_search_insertion_point(sorted_data, 20)
# => 5

# Where to insert 1 (before everything)
binary_search_insertion_point(sorted_data, 1)
# => 0

# Where to insert 50 (after everything)
binary_search_insertion_point(sorted_data, 50)
# => 7

Finding the leftmost occurrence of a target value:

def binary_search_leftmost(array, target)
  left = 0
  right = array.length - 1
  result = nil
  
  while left <= right
    mid = (left + right) / 2
    
    if array[mid] == target
      result = mid
      right = mid - 1  # Continue searching left half
    elsif array[mid] < target
      left = mid + 1
    else
      right = mid - 1
    end
  end
  
  result
end

data_with_duplicates = [3, 7, 7, 7, 19, 23, 42]
binary_search_leftmost(data_with_duplicates, 7)
# => 1 (first occurrence)

Performance Considerations

Linear search executes in O(n) time complexity for worst-case scenarios, where n equals the collection size. The algorithm must examine every element when the target appears at the end or is absent from the collection. Average-case performance is O(n/2), still linear but with a smaller constant factor. Best-case performance is O(1) when the target appears first.

Binary search achieves O(log n) time complexity. The algorithm eliminates half the remaining elements with each comparison. For a collection of 1,000,000 elements, binary search requires at most 20 comparisons (⌈log₂(1,000,000)⌉ = 20), while linear search potentially requires 1,000,000 comparisons. The performance gap widens exponentially as collection size increases.

Space complexity differs between approaches. Linear search uses O(1) auxiliary space, requiring only an index variable or iterator. Binary search also uses O(1) space in iterative implementations, maintaining only the left, right, and mid pointers. Recursive binary search implementations consume O(log n) stack space due to function call overhead.

The crossover point between algorithms depends on multiple factors. For small collections (typically under 10-20 elements), linear search often outperforms binary search despite its worse time complexity. The constant factors in binary search (calculating midpoints, pointer manipulation) exceed the simple comparison and increment operations in linear search. Additionally, if the target typically appears near the collection's start, linear search's early-exit behavior provides better average performance.

Sorting costs affect the total performance picture. Binary search requires sorted input, and sorting takes O(n log n) time for comparison-based algorithms. If searching an unsorted collection once, the combined cost of sorting and binary searching (O(n log n + log n) = O(n log n)) exceeds linear search's O(n) cost. Binary search becomes advantageous when performing multiple searches on the same dataset, amortizing the sorting cost across operations.

Cache performance favors linear search for small to medium collections. Linear search exhibits excellent spatial locality, accessing consecutive memory locations. Modern CPU caches load entire cache lines on first access, making subsequent accesses to nearby elements nearly free. Binary search jumps between distant memory locations, potentially causing cache misses at each iteration. For collections fitting in CPU cache (typically 32-64KB of data), linear search's cache-friendly access patterns may outweigh binary search's algorithmic advantage.

Branch prediction affects both algorithms differently. Modern CPUs predict branch outcomes to maintain instruction pipelines. Linear search contains a single conditional branch per iteration (element match check), which mispredicts only once per search. Binary search contains multiple branches per iteration (comparison and direction choice), creating more opportunities for misprediction. However, the logarithmic iteration count of binary search reduces the total number of branches executed.

Practical benchmarks demonstrate these trade-offs:

require 'benchmark'

def measure_search_performance(size)
  sorted_array = (1..size).to_a
  target = size - 1  # Worst case for linear, average for binary
  
  Benchmark.bm(20) do |x|
    x.report("Linear search:") do
      100_000.times { sorted_array.index(target) }
    end
    
    x.report("Binary search:") do
      100_000.times { sorted_array.bsearch { |x| x >= target } }
    end
  end
end

# Small collection (10 elements)
# Linear search often wins due to lower overhead

# Medium collection (1,000 elements)
# Binary search begins to show advantage

# Large collection (100,000 elements)
# Binary search dominates

The performance characteristics guide algorithm selection. Use linear search when: collection size is small (under 20-50 elements), data is unsorted and will be searched once, or elements cluster near the collection's start. Use binary search when: collection is large, data is already sorted or will be searched repeatedly, or worst-case performance guarantees are required.

Practical Examples

Searching a customer database by account number demonstrates linear search when data lacks indexing:

class Customer
  attr_reader :account_number, :name, :email
  
  def initialize(account_number, name, email)
    @account_number = account_number
    @name = name
    @email = email
  end
end

customers = [
  Customer.new(5017, "Alice Johnson", "alice@example.com"),
  Customer.new(3421, "Bob Smith", "bob@example.com"),
  Customer.new(8203, "Charlie Davis", "charlie@example.com"),
  Customer.new(1592, "Diana Martinez", "diana@example.com")
]

def find_customer_linear(customers, account_number)
  customers.find { |c| c.account_number == account_number }
end

customer = find_customer_linear(customers, 8203)
customer.name if customer
# => "Charlie Davis"

The same scenario optimized with binary search after sorting by account number:

# Sort customers by account number once
sorted_customers = customers.sort_by(&:account_number)
# => [1592, 3421, 5017, 8203]

def find_customer_binary(sorted_customers, account_number)
  index = sorted_customers.bsearch_index do |customer|
    customer.account_number >= account_number
  end
  
  return nil if index.nil?
  
  customer = sorted_customers[index]
  customer.account_number == account_number ? customer : nil
end

customer = find_customer_binary(sorted_customers, 5017)
customer.name if customer
# => "Alice Johnson"

Log file analysis searching for entries within a time range:

class LogEntry
  attr_reader :timestamp, :level, :message
  
  def initialize(timestamp, level, message)
    @timestamp = timestamp
    @level = level
    @message = message
  end
end

# Logs sorted by timestamp
logs = [
  LogEntry.new(Time.new(2025, 1, 15, 10, 30), "INFO", "Server started"),
  LogEntry.new(Time.new(2025, 1, 15, 10, 45), "WARN", "High memory usage"),
  LogEntry.new(Time.new(2025, 1, 15, 11, 0), "ERROR", "Connection failed"),
  LogEntry.new(Time.new(2025, 1, 15, 11, 15), "INFO", "Retry successful"),
  LogEntry.new(Time.new(2025, 1, 15, 11, 30), "INFO", "Request processed")
]

def find_logs_in_range(logs, start_time, end_time)
  # Find first log >= start_time
  start_index = logs.bsearch_index { |log| log.timestamp >= start_time }
  return [] if start_index.nil?
  
  # Find first log > end_time
  end_index = logs.bsearch_index { |log| log.timestamp > end_time }
  end_index ||= logs.length
  
  logs[start_index...end_index]
end

start = Time.new(2025, 1, 15, 10, 40)
finish = Time.new(2025, 1, 15, 11, 10)

matching_logs = find_logs_in_range(logs, start, finish)
matching_logs.map(&:message)
# => ["High memory usage", "Connection failed"]

Dictionary implementation using binary search for word lookup:

class Dictionary
  def initialize(words)
    @words = words.sort
  end
  
  def include?(word)
    index = @words.bsearch_index { |w| w >= word.downcase }
    !index.nil? && @words[index] == word.downcase
  end
  
  def suggestions(prefix)
    start_index = @words.bsearch_index { |w| w >= prefix }
    return [] if start_index.nil?
    
    results = []
    index = start_index
    
    while index < @words.length && @words[index].start_with?(prefix)
      results << @words[index]
      index += 1
    end
    
    results
  end
end

dict = Dictionary.new(["apple", "application", "apply", "banana", "band", "cat"])

dict.include?("apply")
# => true

dict.suggestions("app")
# => ["apple", "application", "apply"]

dict.suggestions("ban")
# => ["banana", "band"]

Inventory system searching for products by price range:

class Product
  attr_reader :id, :name, :price
  
  def initialize(id, name, price)
    @id = id
    @name = name
    @price = price
  end
end

products = [
  Product.new(1, "Widget", 9.99),
  Product.new(2, "Gadget", 24.99),
  Product.new(3, "Tool", 49.99),
  Product.new(4, "Device", 99.99),
  Product.new(5, "Machine", 199.99)
]

# Sort by price
products_by_price = products.sort_by(&:price)

def find_products_in_range(products, min_price, max_price)
  start_index = products.bsearch_index { |p| p.price >= min_price }
  return [] if start_index.nil?
  
  end_index = products.bsearch_index { |p| p.price > max_price }
  end_index ||= products.length
  
  products[start_index...end_index]
end

affordable_products = find_products_in_range(products_by_price, 20.0, 60.0)
affordable_products.map(&:name)
# => ["Gadget", "Tool"]

Common Patterns

The find-first pattern locates the initial occurrence in an ordered collection:

# Linear search naturally finds first occurrence
def find_first_linear(array, target)
  array.index(target)
end

# Binary search requires modification to continue left
def find_first_binary(array, target)
  left = 0
  right = array.length - 1
  result = nil
  
  while left <= right
    mid = (left + right) / 2
    
    if array[mid] == target
      result = mid
      right = mid - 1  # Keep searching left
    elsif array[mid] < target
      left = mid + 1
    else
      right = mid - 1
    end
  end
  
  result
end

data = [2, 5, 5, 5, 5, 8, 10]
find_first_binary(data, 5)
# => 1

The find-last pattern locates the final occurrence:

def find_last_binary(array, target)
  left = 0
  right = array.length - 1
  result = nil
  
  while left <= right
    mid = (left + right) / 2
    
    if array[mid] == target
      result = mid
      left = mid + 1  # Keep searching right
    elsif array[mid] < target
      left = mid + 1
    else
      right = mid - 1
    end
  end
  
  result
end

data = [2, 5, 5, 5, 5, 8, 10]
find_last_binary(data, 5)
# => 4

The count-occurrences pattern combines find-first and find-last:

def count_occurrences(array, target)
  first = find_first_binary(array, target)
  return 0 if first.nil?
  
  last = find_last_binary(array, target)
  last - first + 1
end

data = [2, 5, 5, 5, 5, 8, 10]
count_occurrences(data, 5)
# => 4

The lower-bound pattern finds the insertion point for a target value:

def lower_bound(array, target)
  left = 0
  right = array.length
  
  while left < right
    mid = (left + right) / 2
    
    if array[mid] < target
      left = mid + 1
    else
      right = mid
    end
  end
  
  left
end

data = [2, 5, 8, 10, 15]
lower_bound(data, 7)
# => 2 (position to insert 7)

The upper-bound pattern finds the position after the last occurrence:

def upper_bound(array, target)
  left = 0
  right = array.length
  
  while left < right
    mid = (left + right) / 2
    
    if array[mid] <= target
      left = mid + 1
    else
      right = mid
    end
  end
  
  left
end

data = [2, 5, 5, 5, 8, 10]
upper_bound(data, 5)
# => 4 (position after last 5)

The sentinel pattern optimizes linear search by eliminating boundary checks:

def linear_search_with_sentinel(array, target)
  original_last = array[-1]
  array[-1] = target
  
  index = 0
  index += 1 while array[index] != target
  
  array[-1] = original_last
  
  return index if index < array.length - 1 || original_last == target
  nil
end

data = [15, 3, 42, 7, 23, 8, 19]
linear_search_with_sentinel(data, 23)
# => 4

Common Pitfalls

Off-by-one errors plague binary search implementations. The most common mistake involves incorrect boundary updates:

# INCORRECT: Infinite loop when target not found
def binary_search_wrong(array, target)
  left = 0
  right = array.length - 1
  
  while left <= right
    mid = (left + right) / 2
    
    if array[mid] == target
      return mid
    elsif array[mid] < target
      left = mid  # WRONG: should be mid + 1
    else
      right = mid  # WRONG: should be mid - 1
    end
  end
  
  nil
end

# This creates infinite loop when searching for 20 in [10, 30]
# mid stays at 0, left stays at 0

Integer overflow affects midpoint calculation in languages with fixed-size integers. Ruby handles arbitrary-precision integers, but the pattern matters for cross-language understanding:

# Problematic in languages with fixed integer sizes
mid = (left + right) / 2  # Can overflow if left + right exceeds max int

# Safer alternative
mid = left + (right - left) / 2

Forgetting the sorted requirement causes incorrect results:

data = [42, 7, 23, 3, 19]  # UNSORTED

# Binary search returns unpredictable results
result = data.bsearch { |x| x >= 19 }
# May return wrong element or nil

Using the wrong comparison mode with bsearch produces unexpected results:

sorted = [3, 7, 8, 15, 19, 23, 42]

# WRONG: Returns elements, not exact match
result = sorted.bsearch { |x| x >= 19 }
# => 19 (happens to work, but finds first >= 19, not exactly 19)

# WRONG: Comparing against wrong value
result = sorted.bsearch { |x| x >= 20 }
# => 23 (finds first element >= 20, not 20 itself)

# CORRECT: For exact match, check result
target = 20
result = sorted.bsearch { |x| x >= target }
result == target ? result : nil
# => nil (correctly indicates 20 not present)

Modifying the collection during search creates undefined behavior:

# DANGEROUS: Modifying array during iteration
data = [15, 3, 42, 7, 23]
data.each_with_index do |element, index|
  if element == 42
    data.delete_at(index)  # Modifies array during iteration
  end
end

Comparing incompatible types causes exceptions:

mixed = [1, "two", 3, "four"]
mixed.sort  # Raises ArgumentError: comparison of Integer with String failed

# Binary search on unsortable data fails
mixed.bsearch { |x| x >= 2 }  # Comparison error

Assuming bsearch returns indices instead of elements:

sorted = [3, 7, 8, 15, 19, 23, 42]

# WRONG: bsearch returns element, not index
result = sorted.bsearch { |x| x >= 19 }
# => 19 (element value, not index 4)

# CORRECT: Use bsearch_index for indices
index = sorted.bsearch_index { |x| x >= 19 }
# => 4 (correct index)

Handling empty arrays incorrectly:

def binary_search_bad(array, target)
  left = 0
  right = array.length - 1  # Becomes -1 for empty array
  
  while left <= right  # Still enters loop with empty array
    mid = (left + right) / 2  # mid = -1 / 2 = -1 in Ruby
    # Accessing array[-1] gets last element if array not empty
  end
end

# CORRECT: Check for empty array
def binary_search_safe(array, target)
  return nil if array.empty?
  
  left = 0
  right = array.length - 1
  
  while left <= right
    mid = (left + right) / 2
    # Safe to proceed
  end
  
  nil
end

Infinite loops from incorrect loop conditions:

# WRONG: Uses < instead of <=
def binary_search_wrong_condition(array, target)
  left = 0
  right = array.length - 1
  
  while left < right  # WRONG: misses case when left == right
    mid = (left + right) / 2
    
    if array[mid] == target
      return mid
    elsif array[mid] < target
      left = mid + 1
    else
      right = mid - 1
    end
  end
  
  nil
end

# Fails when target is at position where left == right
data = [10]
binary_search_wrong_condition(data, 10)
# => nil (WRONG: should return 0)

Reference

Algorithm Comparison

Characteristic	Linear Search	Binary Search
Time Complexity (Worst)	O(n)	O(log n)
Time Complexity (Average)	O(n)	O(log n)
Time Complexity (Best)	O(1)	O(1)
Space Complexity	O(1)	O(1) iterative, O(log n) recursive
Requires Sorted Data	No	Yes
Works on Any Structure	Yes	Arrays/indexed collections only
Cache Performance	Excellent (sequential access)	Poor (random access)
Implementation Complexity	Simple	Moderate

Ruby Methods Reference

Method	Class	Return Type	Use Case
index	Array, Enumerable	Integer or nil	Find position with linear search
include?	Array, Enumerable	Boolean	Check existence with linear search
find	Enumerable	Element or nil	Find first matching element
find_index	Enumerable	Integer or nil	Find first matching index
bsearch	Array	Element or nil	Binary search returning element
bsearch_index	Array	Integer or nil	Binary search returning index
sort	Array, Enumerable	Sorted array	Prepare data for binary search
sort_by	Array, Enumerable	Sorted array	Sort by attribute for binary search

Binary Search Block Modes

Mode	Block Return	Behavior
Find minimum	true/false	Returns first element where block returns true
Find exact (workaround)	true/false	Check if result equals target after search
Custom comparison	-1/0/1	Use spaceship operator for exact matches

Performance Guidelines

Collection Size	Unsorted, Single Search	Sorted, Single Search	Multiple Searches
Under 10 elements	Linear search	Linear search	Linear search
10-100 elements	Linear search	Binary search	Sort once, binary search
100-10,000 elements	Linear search	Binary search	Sort once, binary search
Over 10,000 elements	Linear search	Binary search	Sort once, binary search

Common Binary Search Variations

Pattern	Purpose	Key Modification
Find first	Locate leftmost occurrence	Continue searching left after match
Find last	Locate rightmost occurrence	Continue searching right after match
Lower bound	Find insertion point	Search for first element >= target
Upper bound	Find position after last	Search for first element > target
Count occurrences	Count duplicates	Find last - find first + 1

Edge Cases Checklist

Scenario	Linear Search Behavior	Binary Search Behavior
Empty array	Returns nil immediately	Returns nil, check first
Single element	Checks one element	Checks one element
Target at start	Returns 0 immediately	Takes log(n) comparisons
Target at end	Checks all elements	Takes log(n) comparisons
Target absent	Checks all elements	Takes log(n) comparisons
Duplicate elements	Returns first occurrence	Returns any occurrence
Unsorted data	Works correctly	Undefined behavior

Linear and Binary Search