Overview
Linear and binary search represent two fundamental approaches to locating elements within collections. Linear search examines each element sequentially until finding the target or exhausting the collection. Binary search applies a divide-and-conquer strategy to sorted collections, repeatedly halving the search space until locating the target or determining its absence.
These algorithms form the foundation for data retrieval operations across software systems. Database query optimizers, file systems, in-memory caches, and collection libraries all depend on search algorithms to locate data efficiently. The choice between linear and binary search affects application performance, memory usage, and implementation complexity.
Linear search works on any collection structure, requiring no preprocessing or special ordering. The algorithm starts at the first element and compares each subsequent element against the target value. When a match occurs, the algorithm returns the element's position. If the search reaches the collection's end without finding the target, it indicates absence.
Binary search requires sorted input but delivers logarithmic time complexity. The algorithm compares the target against the middle element. If the target matches, the search succeeds immediately. If the target is smaller, the algorithm discards the upper half and searches the lower half. If larger, it discards the lower half and searches the upper half. This halving process continues until finding the target or reducing the search space to zero.
The fundamental difference in their approaches creates distinct performance profiles. Linear search performs O(n) comparisons in the worst case, where n represents the collection size. Binary search performs O(log n) comparisons, making it exponentially faster for large datasets. However, binary search's requirement for sorted data adds preprocessing costs that may outweigh its benefits for small collections or unsorted data.
Key Principles
Linear search implements sequential access patterns. The algorithm maintains a single pointer or index that advances through the collection one element at a time. Each iteration compares the current element against the target value. The comparison determines whether to return the current position (on match) or continue to the next element (on mismatch). The algorithm terminates when either finding the target or reaching the collection boundary.
The algorithm's simplicity makes it applicable to any iterable data structure. Linked lists, arrays, sets, and streams all support linear search without modification. The algorithm requires no knowledge of the collection's internal organization, accessing elements strictly through iteration protocols.
Binary search depends on the sorted property to eliminate portions of the search space. The algorithm maintains two pointers, left and right, defining the current search boundaries. Each iteration calculates the midpoint between these boundaries and compares the element at that position against the target.
Three outcomes are possible from each comparison. If the middle element equals the target, the search succeeds and returns the index. If the target is less than the middle element, all elements in the upper half exceed the target (due to sorting), so the algorithm updates right = mid - 1 to search only the lower half. If the target is greater than the middle element, all elements in the lower half are too small, so the algorithm updates left = mid + 1 to search only the upper half.
The search space shrinks by half with each iteration. Starting with n elements, the first comparison eliminates approximately n/2 elements. The second eliminates approximately n/4 elements. This geometric reduction continues until the search space contains zero or one element. The maximum number of iterations equals ⌈log₂(n)⌉, the ceiling of the base-2 logarithm of the collection size.
Binary search terminates under two conditions. Success occurs when the middle element equals the target. Failure occurs when left > right, indicating the search space has been exhausted without finding the target. The final position of the pointers provides information about where the target would belong if inserted into the collection.
The sorted requirement restricts binary search's applicability. Collections must maintain ordering according to a consistent comparison function. The comparison must be transitive (if a < b and b < c, then a < c) and consistent (comparing the same elements always yields the same result). Violations of these properties produce undefined behavior, potentially causing infinite loops or incorrect results.
Both algorithms exhibit different behaviors with duplicate elements. Linear search returns the first occurrence when scanning left-to-right. Binary search may return any occurrence, as the algorithm's path through the collection depends on the specific values and positions of elements. Finding the first or last occurrence with binary search requires modifications to the standard algorithm.
Ruby Implementation
Ruby provides built-in methods for both search approaches through the Enumerable and Array classes. The include? and index methods implement linear search, while bsearch and bsearch_index implement binary search variants.
The index method performs linear search, returning the position of the first matching element:
numbers = [15, 3, 42, 7, 23, 8, 19]
position = numbers.index(42)
# => 2
names = ["Alice", "Bob", "Charlie", "Diana"]
names.index("Charlie")
# => 2
# Returns nil when element not found
numbers.index(100)
# => nil
The include? method also uses linear search but returns a boolean:
numbers = [15, 3, 42, 7, 23, 8, 19]
numbers.include?(42)
# => true
numbers.include?(100)
# => false
For binary search, Ruby's bsearch method requires a sorted array and accepts a block that returns a boolean or numeric comparison result:
sorted_numbers = [3, 7, 8, 15, 19, 23, 42]
# Find mode: block returns true/false
result = sorted_numbers.bsearch { |x| x >= 19 }
# => 19
# Finds the first element where block returns true
sorted_numbers.bsearch { |x| x >= 20 }
# => 23
# Returns nil if no element satisfies condition
sorted_numbers.bsearch { |x| x >= 100 }
# => nil
The bsearch_index method returns the index instead of the element:
sorted_numbers = [3, 7, 8, 15, 19, 23, 42]
index = sorted_numbers.bsearch_index { |x| x >= 19 }
# => 4
# Returns nil when no match
sorted_numbers.bsearch_index { |x| x >= 100 }
# => nil
Custom linear search implementation demonstrates the sequential scan approach:
def linear_search(array, target)
array.each_with_index do |element, index|
return index if element == target
end
nil
end
data = [15, 3, 42, 7, 23, 8, 19]
linear_search(data, 23)
# => 4
linear_search(data, 99)
# => nil
Custom binary search implementation shows the divide-and-conquer strategy:
def binary_search(array, target)
left = 0
right = array.length - 1
while left <= right
mid = (left + right) / 2
if array[mid] == target
return mid
elsif array[mid] < target
left = mid + 1
else
right = mid - 1
end
end
nil
end
sorted_data = [3, 7, 8, 15, 19, 23, 42]
binary_search(sorted_data, 19)
# => 4
binary_search(sorted_data, 99)
# => nil
Binary search with custom comparison function:
def binary_search_with_comparator(array, target, comparator)
left = 0
right = array.length - 1
while left <= right
mid = (left + right) / 2
comparison = comparator.call(array[mid], target)
if comparison == 0
return mid
elsif comparison < 0
left = mid + 1
else
right = mid - 1
end
end
nil
end
# Search case-insensitive strings
names = ["alice", "bob", "charlie", "diana"]
comparator = ->(a, b) { a.downcase <=> b.downcase }
binary_search_with_comparator(names, "CHARLIE", comparator)
# => 2
Finding the insertion point for a target value:
def binary_search_insertion_point(array, target)
left = 0
right = array.length - 1
while left <= right
mid = (left + right) / 2
if array[mid] < target
left = mid + 1
else
right = mid - 1
end
end
left
end
sorted_data = [3, 7, 8, 15, 19, 23, 42]
# Where to insert 20 to maintain sorted order
binary_search_insertion_point(sorted_data, 20)
# => 5
# Where to insert 1 (before everything)
binary_search_insertion_point(sorted_data, 1)
# => 0
# Where to insert 50 (after everything)
binary_search_insertion_point(sorted_data, 50)
# => 7
Finding the leftmost occurrence of a target value:
def binary_search_leftmost(array, target)
left = 0
right = array.length - 1
result = nil
while left <= right
mid = (left + right) / 2
if array[mid] == target
result = mid
right = mid - 1 # Continue searching left half
elsif array[mid] < target
left = mid + 1
else
right = mid - 1
end
end
result
end
data_with_duplicates = [3, 7, 7, 7, 19, 23, 42]
binary_search_leftmost(data_with_duplicates, 7)
# => 1 (first occurrence)
Performance Considerations
Linear search executes in O(n) time complexity for worst-case scenarios, where n equals the collection size. The algorithm must examine every element when the target appears at the end or is absent from the collection. Average-case performance is O(n/2), still linear but with a smaller constant factor. Best-case performance is O(1) when the target appears first.
Binary search achieves O(log n) time complexity. The algorithm eliminates half the remaining elements with each comparison. For a collection of 1,000,000 elements, binary search requires at most 20 comparisons (⌈log₂(1,000,000)⌉ = 20), while linear search potentially requires 1,000,000 comparisons. The performance gap widens exponentially as collection size increases.
Space complexity differs between approaches. Linear search uses O(1) auxiliary space, requiring only an index variable or iterator. Binary search also uses O(1) space in iterative implementations, maintaining only the left, right, and mid pointers. Recursive binary search implementations consume O(log n) stack space due to function call overhead.
The crossover point between algorithms depends on multiple factors. For small collections (typically under 10-20 elements), linear search often outperforms binary search despite its worse time complexity. The constant factors in binary search (calculating midpoints, pointer manipulation) exceed the simple comparison and increment operations in linear search. Additionally, if the target typically appears near the collection's start, linear search's early-exit behavior provides better average performance.
Sorting costs affect the total performance picture. Binary search requires sorted input, and sorting takes O(n log n) time for comparison-based algorithms. If searching an unsorted collection once, the combined cost of sorting and binary searching (O(n log n + log n) = O(n log n)) exceeds linear search's O(n) cost. Binary search becomes advantageous when performing multiple searches on the same dataset, amortizing the sorting cost across operations.
Cache performance favors linear search for small to medium collections. Linear search exhibits excellent spatial locality, accessing consecutive memory locations. Modern CPU caches load entire cache lines on first access, making subsequent accesses to nearby elements nearly free. Binary search jumps between distant memory locations, potentially causing cache misses at each iteration. For collections fitting in CPU cache (typically 32-64KB of data), linear search's cache-friendly access patterns may outweigh binary search's algorithmic advantage.
Branch prediction affects both algorithms differently. Modern CPUs predict branch outcomes to maintain instruction pipelines. Linear search contains a single conditional branch per iteration (element match check), which mispredicts only once per search. Binary search contains multiple branches per iteration (comparison and direction choice), creating more opportunities for misprediction. However, the logarithmic iteration count of binary search reduces the total number of branches executed.
Practical benchmarks demonstrate these trade-offs:
require 'benchmark'
def measure_search_performance(size)
sorted_array = (1..size).to_a
target = size - 1 # Worst case for linear, average for binary
Benchmark.bm(20) do |x|
x.report("Linear search:") do
100_000.times { sorted_array.index(target) }
end
x.report("Binary search:") do
100_000.times { sorted_array.bsearch { |x| x >= target } }
end
end
end
# Small collection (10 elements)
# Linear search often wins due to lower overhead
# Medium collection (1,000 elements)
# Binary search begins to show advantage
# Large collection (100,000 elements)
# Binary search dominates
The performance characteristics guide algorithm selection. Use linear search when: collection size is small (under 20-50 elements), data is unsorted and will be searched once, or elements cluster near the collection's start. Use binary search when: collection is large, data is already sorted or will be searched repeatedly, or worst-case performance guarantees are required.
Practical Examples
Searching a customer database by account number demonstrates linear search when data lacks indexing:
class Customer
attr_reader :account_number, :name, :email
def initialize(account_number, name, email)
@account_number = account_number
@name = name
@email = email
end
end
customers = [
Customer.new(5017, "Alice Johnson", "alice@example.com"),
Customer.new(3421, "Bob Smith", "bob@example.com"),
Customer.new(8203, "Charlie Davis", "charlie@example.com"),
Customer.new(1592, "Diana Martinez", "diana@example.com")
]
def find_customer_linear(customers, account_number)
customers.find { |c| c.account_number == account_number }
end
customer = find_customer_linear(customers, 8203)
customer.name if customer
# => "Charlie Davis"
The same scenario optimized with binary search after sorting by account number:
# Sort customers by account number once
sorted_customers = customers.sort_by(&:account_number)
# => [1592, 3421, 5017, 8203]
def find_customer_binary(sorted_customers, account_number)
index = sorted_customers.bsearch_index do |customer|
customer.account_number >= account_number
end
return nil if index.nil?
customer = sorted_customers[index]
customer.account_number == account_number ? customer : nil
end
customer = find_customer_binary(sorted_customers, 5017)
customer.name if customer
# => "Alice Johnson"
Log file analysis searching for entries within a time range:
class LogEntry
attr_reader :timestamp, :level, :message
def initialize(timestamp, level, message)
@timestamp = timestamp
@level = level
@message = message
end
end
# Logs sorted by timestamp
logs = [
LogEntry.new(Time.new(2025, 1, 15, 10, 30), "INFO", "Server started"),
LogEntry.new(Time.new(2025, 1, 15, 10, 45), "WARN", "High memory usage"),
LogEntry.new(Time.new(2025, 1, 15, 11, 0), "ERROR", "Connection failed"),
LogEntry.new(Time.new(2025, 1, 15, 11, 15), "INFO", "Retry successful"),
LogEntry.new(Time.new(2025, 1, 15, 11, 30), "INFO", "Request processed")
]
def find_logs_in_range(logs, start_time, end_time)
# Find first log >= start_time
start_index = logs.bsearch_index { |log| log.timestamp >= start_time }
return [] if start_index.nil?
# Find first log > end_time
end_index = logs.bsearch_index { |log| log.timestamp > end_time }
end_index ||= logs.length
logs[start_index...end_index]
end
start = Time.new(2025, 1, 15, 10, 40)
finish = Time.new(2025, 1, 15, 11, 10)
matching_logs = find_logs_in_range(logs, start, finish)
matching_logs.map(&:message)
# => ["High memory usage", "Connection failed"]
Dictionary implementation using binary search for word lookup:
class Dictionary
def initialize(words)
@words = words.sort
end
def include?(word)
index = @words.bsearch_index { |w| w >= word.downcase }
!index.nil? && @words[index] == word.downcase
end
def suggestions(prefix)
start_index = @words.bsearch_index { |w| w >= prefix }
return [] if start_index.nil?
results = []
index = start_index
while index < @words.length && @words[index].start_with?(prefix)
results << @words[index]
index += 1
end
results
end
end
dict = Dictionary.new(["apple", "application", "apply", "banana", "band", "cat"])
dict.include?("apply")
# => true
dict.suggestions("app")
# => ["apple", "application", "apply"]
dict.suggestions("ban")
# => ["banana", "band"]
Inventory system searching for products by price range:
class Product
attr_reader :id, :name, :price
def initialize(id, name, price)
@id = id
@name = name
@price = price
end
end
products = [
Product.new(1, "Widget", 9.99),
Product.new(2, "Gadget", 24.99),
Product.new(3, "Tool", 49.99),
Product.new(4, "Device", 99.99),
Product.new(5, "Machine", 199.99)
]
# Sort by price
products_by_price = products.sort_by(&:price)
def find_products_in_range(products, min_price, max_price)
start_index = products.bsearch_index { |p| p.price >= min_price }
return [] if start_index.nil?
end_index = products.bsearch_index { |p| p.price > max_price }
end_index ||= products.length
products[start_index...end_index]
end
affordable_products = find_products_in_range(products_by_price, 20.0, 60.0)
affordable_products.map(&:name)
# => ["Gadget", "Tool"]
Common Patterns
The find-first pattern locates the initial occurrence in an ordered collection:
# Linear search naturally finds first occurrence
def find_first_linear(array, target)
array.index(target)
end
# Binary search requires modification to continue left
def find_first_binary(array, target)
left = 0
right = array.length - 1
result = nil
while left <= right
mid = (left + right) / 2
if array[mid] == target
result = mid
right = mid - 1 # Keep searching left
elsif array[mid] < target
left = mid + 1
else
right = mid - 1
end
end
result
end
data = [2, 5, 5, 5, 5, 8, 10]
find_first_binary(data, 5)
# => 1
The find-last pattern locates the final occurrence:
def find_last_binary(array, target)
left = 0
right = array.length - 1
result = nil
while left <= right
mid = (left + right) / 2
if array[mid] == target
result = mid
left = mid + 1 # Keep searching right
elsif array[mid] < target
left = mid + 1
else
right = mid - 1
end
end
result
end
data = [2, 5, 5, 5, 5, 8, 10]
find_last_binary(data, 5)
# => 4
The count-occurrences pattern combines find-first and find-last:
def count_occurrences(array, target)
first = find_first_binary(array, target)
return 0 if first.nil?
last = find_last_binary(array, target)
last - first + 1
end
data = [2, 5, 5, 5, 5, 8, 10]
count_occurrences(data, 5)
# => 4
The lower-bound pattern finds the insertion point for a target value:
def lower_bound(array, target)
left = 0
right = array.length
while left < right
mid = (left + right) / 2
if array[mid] < target
left = mid + 1
else
right = mid
end
end
left
end
data = [2, 5, 8, 10, 15]
lower_bound(data, 7)
# => 2 (position to insert 7)
The upper-bound pattern finds the position after the last occurrence:
def upper_bound(array, target)
left = 0
right = array.length
while left < right
mid = (left + right) / 2
if array[mid] <= target
left = mid + 1
else
right = mid
end
end
left
end
data = [2, 5, 5, 5, 8, 10]
upper_bound(data, 5)
# => 4 (position after last 5)
The sentinel pattern optimizes linear search by eliminating boundary checks:
def linear_search_with_sentinel(array, target)
original_last = array[-1]
array[-1] = target
index = 0
index += 1 while array[index] != target
array[-1] = original_last
return index if index < array.length - 1 || original_last == target
nil
end
data = [15, 3, 42, 7, 23, 8, 19]
linear_search_with_sentinel(data, 23)
# => 4
Common Pitfalls
Off-by-one errors plague binary search implementations. The most common mistake involves incorrect boundary updates:
# INCORRECT: Infinite loop when target not found
def binary_search_wrong(array, target)
left = 0
right = array.length - 1
while left <= right
mid = (left + right) / 2
if array[mid] == target
return mid
elsif array[mid] < target
left = mid # WRONG: should be mid + 1
else
right = mid # WRONG: should be mid - 1
end
end
nil
end
# This creates infinite loop when searching for 20 in [10, 30]
# mid stays at 0, left stays at 0
Integer overflow affects midpoint calculation in languages with fixed-size integers. Ruby handles arbitrary-precision integers, but the pattern matters for cross-language understanding:
# Problematic in languages with fixed integer sizes
mid = (left + right) / 2 # Can overflow if left + right exceeds max int
# Safer alternative
mid = left + (right - left) / 2
Forgetting the sorted requirement causes incorrect results:
data = [42, 7, 23, 3, 19] # UNSORTED
# Binary search returns unpredictable results
result = data.bsearch { |x| x >= 19 }
# May return wrong element or nil
Using the wrong comparison mode with bsearch produces unexpected results:
sorted = [3, 7, 8, 15, 19, 23, 42]
# WRONG: Returns elements, not exact match
result = sorted.bsearch { |x| x >= 19 }
# => 19 (happens to work, but finds first >= 19, not exactly 19)
# WRONG: Comparing against wrong value
result = sorted.bsearch { |x| x >= 20 }
# => 23 (finds first element >= 20, not 20 itself)
# CORRECT: For exact match, check result
target = 20
result = sorted.bsearch { |x| x >= target }
result == target ? result : nil
# => nil (correctly indicates 20 not present)
Modifying the collection during search creates undefined behavior:
# DANGEROUS: Modifying array during iteration
data = [15, 3, 42, 7, 23]
data.each_with_index do |element, index|
if element == 42
data.delete_at(index) # Modifies array during iteration
end
end
Comparing incompatible types causes exceptions:
mixed = [1, "two", 3, "four"]
mixed.sort # Raises ArgumentError: comparison of Integer with String failed
# Binary search on unsortable data fails
mixed.bsearch { |x| x >= 2 } # Comparison error
Assuming bsearch returns indices instead of elements:
sorted = [3, 7, 8, 15, 19, 23, 42]
# WRONG: bsearch returns element, not index
result = sorted.bsearch { |x| x >= 19 }
# => 19 (element value, not index 4)
# CORRECT: Use bsearch_index for indices
index = sorted.bsearch_index { |x| x >= 19 }
# => 4 (correct index)
Handling empty arrays incorrectly:
def binary_search_bad(array, target)
left = 0
right = array.length - 1 # Becomes -1 for empty array
while left <= right # Still enters loop with empty array
mid = (left + right) / 2 # mid = -1 / 2 = -1 in Ruby
# Accessing array[-1] gets last element if array not empty
end
end
# CORRECT: Check for empty array
def binary_search_safe(array, target)
return nil if array.empty?
left = 0
right = array.length - 1
while left <= right
mid = (left + right) / 2
# Safe to proceed
end
nil
end
Infinite loops from incorrect loop conditions:
# WRONG: Uses < instead of <=
def binary_search_wrong_condition(array, target)
left = 0
right = array.length - 1
while left < right # WRONG: misses case when left == right
mid = (left + right) / 2
if array[mid] == target
return mid
elsif array[mid] < target
left = mid + 1
else
right = mid - 1
end
end
nil
end
# Fails when target is at position where left == right
data = [10]
binary_search_wrong_condition(data, 10)
# => nil (WRONG: should return 0)
Reference
Algorithm Comparison
| Characteristic | Linear Search | Binary Search |
|---|---|---|
| Time Complexity (Worst) | O(n) | O(log n) |
| Time Complexity (Average) | O(n) | O(log n) |
| Time Complexity (Best) | O(1) | O(1) |
| Space Complexity | O(1) | O(1) iterative, O(log n) recursive |
| Requires Sorted Data | No | Yes |
| Works on Any Structure | Yes | Arrays/indexed collections only |
| Cache Performance | Excellent (sequential access) | Poor (random access) |
| Implementation Complexity | Simple | Moderate |
Ruby Methods Reference
| Method | Class | Return Type | Use Case |
|---|---|---|---|
| index | Array, Enumerable | Integer or nil | Find position with linear search |
| include? | Array, Enumerable | Boolean | Check existence with linear search |
| find | Enumerable | Element or nil | Find first matching element |
| find_index | Enumerable | Integer or nil | Find first matching index |
| bsearch | Array | Element or nil | Binary search returning element |
| bsearch_index | Array | Integer or nil | Binary search returning index |
| sort | Array, Enumerable | Sorted array | Prepare data for binary search |
| sort_by | Array, Enumerable | Sorted array | Sort by attribute for binary search |
Binary Search Block Modes
| Mode | Block Return | Behavior |
|---|---|---|
| Find minimum | true/false | Returns first element where block returns true |
| Find exact (workaround) | true/false | Check if result equals target after search |
| Custom comparison | -1/0/1 | Use spaceship operator for exact matches |
Performance Guidelines
| Collection Size | Unsorted, Single Search | Sorted, Single Search | Multiple Searches |
|---|---|---|---|
| Under 10 elements | Linear search | Linear search | Linear search |
| 10-100 elements | Linear search | Binary search | Sort once, binary search |
| 100-10,000 elements | Linear search | Binary search | Sort once, binary search |
| Over 10,000 elements | Linear search | Binary search | Sort once, binary search |
Common Binary Search Variations
| Pattern | Purpose | Key Modification |
|---|---|---|
| Find first | Locate leftmost occurrence | Continue searching left after match |
| Find last | Locate rightmost occurrence | Continue searching right after match |
| Lower bound | Find insertion point | Search for first element >= target |
| Upper bound | Find position after last | Search for first element > target |
| Count occurrences | Count duplicates | Find last - find first + 1 |
Edge Cases Checklist
| Scenario | Linear Search Behavior | Binary Search Behavior |
|---|---|---|
| Empty array | Returns nil immediately | Returns nil, check first |
| Single element | Checks one element | Checks one element |
| Target at start | Returns 0 immediately | Takes log(n) comparisons |
| Target at end | Checks all elements | Takes log(n) comparisons |
| Target absent | Checks all elements | Takes log(n) comparisons |
| Duplicate elements | Returns first occurrence | Returns any occurrence |
| Unsorted data | Works correctly | Undefined behavior |