Overview
Ruby provides extensive searching and filtering capabilities through the Enumerable module and specialized methods on core classes. These operations transform collections by selecting elements that match specific criteria or locate individual items based on conditions.
The primary filtering methods operate on arrays, hashes, and other enumerable objects. Array#select
returns elements matching a condition, while Array#reject
excludes matching elements. Array#find
returns the first matching element, and Array#grep
performs pattern-based searching. Hash filtering works similarly but operates on key-value pairs.
String searching uses methods like String#include?
for substring detection and String#match
for regular expression matching. The String#scan
method extracts all pattern matches, while String#index
locates pattern positions.
# Array filtering
numbers = [1, 2, 3, 4, 5, 6]
evens = numbers.select(&:even?)
# => [2, 4, 6]
# Hash filtering
users = { alice: 25, bob: 30, carol: 22 }
adults = users.select { |name, age| age >= 25 }
# => { alice: 25, bob: 30 }
# String searching
text = "Ruby programming language"
text.include?("program")
# => true
Ruby's filtering methods accept blocks for custom conditions or symbols for method calls. The &:method
syntax converts symbols to blocks, creating concise filtering expressions. Regular expressions enable pattern-based searching across strings and collections.
Basic Usage
Array filtering forms the foundation of collection manipulation in Ruby. The select
method creates new arrays containing elements that satisfy the given condition, while reject
excludes matching elements. Both methods preserve the original array and return filtered copies.
# Basic select and reject
scores = [85, 92, 78, 96, 81, 89]
passing = scores.select { |score| score >= 80 }
# => [85, 92, 96, 81, 89]
failing = scores.reject { |score| score >= 80 }
# => [78]
# Using symbol to proc
words = ["apple", "banana", "cherry", "date"]
long_words = words.select { |word| word.length > 5 }
# => ["banana", "cherry"]
The find
method returns the first element matching the condition, while find_all
serves as an alias for select
. When no match exists, find
returns nil
. The detect
method provides identical functionality to find
.
# Finding single elements
inventory = [
{ item: "laptop", price: 1200, stock: 5 },
{ item: "mouse", price: 25, stock: 15 },
{ item: "keyboard", price: 80, stock: 0 }
]
expensive_item = inventory.find { |product| product[:price] > 1000 }
# => { item: "laptop", price: 1200, stock: 5 }
out_of_stock = inventory.find { |product| product[:stock] == 0 }
# => { item: "keyboard", price: 80, stock: 0 }
Hash filtering operates on key-value pairs, allowing conditions based on keys, values, or both. The block receives two parameters representing the key and value. Filtered results maintain hash structure.
# Hash filtering by value
temperatures = {
"New York" => 72,
"Miami" => 89,
"Chicago" => 65,
"Phoenix" => 105
}
hot_cities = temperatures.select { |city, temp| temp > 80 }
# => { "Miami" => 89, "Phoenix" => 105 }
# Hash filtering by key
settings = {
debug_mode: true,
log_level: "info",
cache_enabled: false,
timeout: 30
}
boolean_settings = settings.select { |key, value| [true, false].include?(value) }
# => { debug_mode: true, cache_enabled: false }
Pattern-based searching uses the grep
method with regular expressions or other pattern objects. This method filters collections based on pattern matching using the ===
operator.
# Pattern matching with grep
files = ["readme.txt", "config.yml", "data.json", "backup.tar.gz"]
text_files = files.grep(/\.(txt|md)$/)
# => ["readme.txt"]
# Numeric range matching
ages = [16, 21, 35, 42, 18, 29]
young_adults = ages.grep(18..25)
# => [21, 18]
Advanced Usage
Complex filtering scenarios require combining multiple conditions and chaining operations. Ruby's enumerable methods compose naturally, allowing sophisticated data processing pipelines. Method chaining applies successive transformations while maintaining readable code structure.
# Multi-condition filtering with chaining
sales_data = [
{ region: "North", product: "laptop", amount: 1500, quarter: "Q1" },
{ region: "South", product: "tablet", amount: 800, quarter: "Q1" },
{ region: "North", product: "phone", amount: 600, quarter: "Q2" },
{ region: "West", product: "laptop", amount: 1200, quarter: "Q2" },
{ region: "South", product: "laptop", amount: 1800, quarter: "Q1" }
]
high_value_north_laptops = sales_data
.select { |sale| sale[:region] == "North" }
.select { |sale| sale[:product] == "laptop" }
.select { |sale| sale[:amount] > 1000 }
.map { |sale| { quarter: sale[:quarter], amount: sale[:amount] } }
# => [{ quarter: "Q1", amount: 1500 }]
Custom predicate objects encapsulate complex filtering logic. Classes implementing the ===
method work seamlessly with grep
and case statements. This approach separates filtering logic from data processing code.
# Custom predicate classes
class PriceRange
def initialize(min, max)
@min, @max = min, max
end
def ===(item)
item.respond_to?(:price) && item.price.between?(@min, @max)
end
end
class Product
attr_reader :name, :price, :category
def initialize(name, price, category)
@name, @price, @category = name, price, category
end
end
products = [
Product.new("Laptop", 1200, "Electronics"),
Product.new("Book", 25, "Education"),
Product.new("Phone", 800, "Electronics")
]
mid_range = PriceRange.new(100, 1000)
affordable_products = products.grep(mid_range)
# => [Product with phone]
Lazy evaluation optimizes filtering performance for large datasets. The lazy
method creates lazy enumerators that process elements on demand rather than generating intermediate arrays.
# Lazy evaluation for memory efficiency
def generate_numbers
(1..Float::INFINITY).lazy
.select(&:prime?)
.select { |n| n.to_s.include?('7') }
.first(10)
end
# This processes only the elements needed to find 10 results
prime_sevens = generate_numbers
# => First 10 prime numbers containing digit 7
Nested structure filtering requires recursive approaches or specialized libraries. Ruby handles nested arrays and hashes through recursive methods or flattening operations combined with filtering.
# Nested hash filtering
company_data = {
departments: {
engineering: {
employees: [
{ name: "Alice", salary: 95000, level: "senior" },
{ name: "Bob", salary: 75000, level: "junior" }
]
},
sales: {
employees: [
{ name: "Carol", salary: 85000, level: "senior" },
{ name: "Dave", salary: 65000, level: "junior" }
]
}
}
}
# Extract all senior employees across departments
def extract_senior_employees(data)
data[:departments].flat_map do |dept_name, dept_data|
dept_data[:employees].select { |emp| emp[:level] == "senior" }
end
end
senior_staff = extract_senior_employees(company_data)
# => [{ name: "Alice", salary: 95000, level: "senior" },
# { name: "Carol", salary: 85000, level: "senior" }]
Performance & Memory
Filtering operations create new collections by default, which impacts memory usage with large datasets. Understanding when methods allocate memory versus operating in-place helps optimize performance-critical applications.
Methods like select
and reject
always create new arrays, while their bang variants like select!
and reject!
modify collections in place. In-place operations save memory but mutate original data structures.
# Memory allocation comparison
large_array = (1..1_000_000).to_a
# Creates new array - high memory usage
filtered = large_array.select(&:even?)
# Modifies original array - lower memory usage
large_array.select!(&:even?)
# Original array now contains only even numbers
Lazy evaluation provides significant performance benefits when processing large collections or infinite sequences. Lazy enumerators process elements on demand, avoiding intermediate collection creation.
# Performance comparison: eager vs lazy
require 'benchmark'
data = (1..10_000_000).to_a
# Eager evaluation - processes all elements
Benchmark.measure do
result = data.select { |n| n % 1000 == 0 }
.map { |n| n * 2 }
.first(100)
end
# => More memory and time consumed
# Lazy evaluation - processes only needed elements
Benchmark.measure do
result = data.lazy
.select { |n| n % 1000 == 0 }
.map { |n| n * 2 }
.first(100)
end
# => Less memory and time consumed
Hash filtering performance depends on the condition complexity and hash size. Filtering by keys generally performs better than value-based filtering since key access uses hash table lookups.
# Key-based filtering (faster)
large_hash = (1..100_000).map { |i| [i, "value_#{i}"] }.to_h
even_keys = large_hash.select { |k, v| k.even? }
# Value-based filtering (slower)
string_values = large_hash.select { |k, v| v.include?("5") }
Regular expression compilation affects search performance in loops. Compiling patterns outside loops and using string methods when possible improves performance.
# Inefficient - compiles regex repeatedly
texts = ["apple", "banana", "application", "grape"] * 1000
results = texts.select { |text| text.match(/app/) }
# Efficient - compiles regex once
pattern = /app/
results = texts.select { |text| text.match(pattern) }
# Most efficient for simple patterns
results = texts.select { |text| text.include?("app") }
Common Pitfalls
Truthy and falsy value handling causes frequent confusion in filtering operations. Ruby considers false
and nil
as falsy, while all other values including zero, empty strings, and empty arrays evaluate as truthy.
# Unexpected truthy behavior
values = [0, "", [], false, nil, "text", 42]
# This keeps 0, "", and [] because they're truthy
truthy_values = values.select { |v| v }
# => [0, "", [], "text", 42]
# Explicit nil checking required for different behavior
non_nil_values = values.select { |v| !v.nil? }
# => [0, "", [], false, "text", 42]
Mutation during iteration creates unpredictable behavior. Modifying collections while filtering can skip elements or cause infinite loops. Always work with copies when modification is necessary during iteration.
# Dangerous - modifying during iteration
numbers = [1, 2, 3, 4, 5]
numbers.each do |num|
numbers.delete(num) if num.even? # Skips elements
end
# => [1, 3, 5] but iteration behavior is undefined
# Safe approach - work with copy
numbers = [1, 2, 3, 4, 5]
numbers_copy = numbers.dup
numbers_copy.each { |num| numbers.delete(num) if num.even? }
# => [1, 3, 5] with predictable behavior
Block variable naming conflicts occur when nested blocks use the same parameter names. This shadowing behavior can cause confusing results in complex filtering operations.
# Block variable shadowing problem
data = [[1, 2], [3, 4], [5, 6]]
# Confusing - 'item' used in both levels
result = data.select do |item|
item.any? { |item| item > 3 } # Inner 'item' shadows outer
end
# Clear - different parameter names
result = data.select do |array|
array.any? { |element| element > 3 }
end
# => [[3, 4], [5, 6]]
Regular expression anchoring mistakes lead to partial matching when full string matching is intended. The match
method finds patterns anywhere in strings unless anchors specify boundaries.
# Unanchored regex matches partial strings
emails = ["user@domain.com", "not-an-email", "test@site.org"]
pattern = /\w+@\w+\.\w+/
# Matches partial strings unexpectedly
valid_emails = emails.select { |email| email.match(pattern) }
# => All three match because pattern doesn't require full string match
# Properly anchored for full string validation
anchored_pattern = /\A\w+@\w+\.\w+\z/
valid_emails = emails.select { |email| email.match(anchored_pattern) }
# => ["user@domain.com", "test@site.org"]
Method chaining with nil values breaks execution chains. When intermediate operations return nil, subsequent method calls raise NoMethodError exceptions.
# Nil handling in method chains
users = [
{ name: "Alice", profile: { age: 30 } },
{ name: "Bob", profile: nil },
{ name: "Carol", profile: { age: 25 } }
]
# This breaks when profile is nil
young_users = users.select do |user|
user[:profile][:age] < 28 # NoMethodError when profile is nil
end
# Safe navigation with nil checking
young_users = users.select do |user|
user[:profile]&.[](:age)&.< 28
end.compact
Reference
Array Filtering Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#select { block } |
Block condition | Array |
Returns elements where block returns truthy |
#select! |
Block condition | Array or nil |
Modifies array in place, returns nil if unchanged |
#reject { block } |
Block condition | Array |
Returns elements where block returns falsy |
#reject! |
Block condition | Array or nil |
Modifies array in place, returns nil if unchanged |
#find { block } |
Block condition | Object or nil |
Returns first matching element |
#find_all { block } |
Block condition | Array |
Alias for select |
#detect { block } |
Block condition | Object or nil |
Alias for find |
#grep(pattern) |
Pattern object | Array |
Returns elements matching pattern with === |
#grep_v(pattern) |
Pattern object | Array |
Returns elements not matching pattern |
Hash Filtering Methods
Method | Parameters | Returns | Description |
---|---|---|---|
`#select { | k,v | block }` | Block with key, value |
`#reject { | k,v | block }` | Block with key, value |
`#filter { | k,v | block }` | Block with key, value |
#compact |
None | Hash |
Removes nil values |
#slice(*keys) |
Key list | Hash |
Returns hash with only specified keys |
#except(*keys) |
Key list | Hash |
Returns hash without specified keys |
String Searching Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#include?(substring) |
String or Regexp | Boolean |
Tests if string contains pattern |
#match(pattern) |
Regexp, optional position | MatchData or nil |
Returns match data for pattern |
#match?(pattern) |
Regexp, optional position | Boolean |
Tests if string matches pattern |
#scan(pattern) |
Regexp | Array |
Returns all pattern matches |
#index(pattern) |
String or Regexp | Integer or nil |
Returns position of first match |
#rindex(pattern) |
String or Regexp | Integer or nil |
Returns position of last match |
Enumerable Module Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#find_index { block } |
Block condition | Integer or nil |
Returns index of first matching element |
#count { block } |
Optional block | Integer |
Returns count of matching elements |
#any? { block } |
Optional block | Boolean |
Tests if any element matches condition |
#all? { block } |
Optional block | Boolean |
Tests if all elements match condition |
#none? { block } |
Optional block | Boolean |
Tests if no elements match condition |
#one? { block } |
Optional block | Boolean |
Tests if exactly one element matches |
Performance Characteristics
Operation | Time Complexity | Memory Usage | Notes |
---|---|---|---|
Array#select |
O(n) | O(k) where k is result size | Creates new array |
Array#find |
O(n) average, O(1) best | O(1) | Stops at first match |
Hash#select |
O(n) | O(k) where k is result size | Creates new hash |
String#include? |
O(n*m) worst case | O(1) | Boyer-Moore optimization |
Regexp#match |
O(n) average | O(1) | Depends on pattern complexity |
Enumerable#grep |
O(n) | O(k) where k is result size | Uses === operator |
Common Pattern Objects
Pattern Type | Usage | Example |
---|---|---|
Regular Expression | String pattern matching | /[a-z]+@[a-z]+\.[a-z]+/ |
Range | Numeric or comparable ranges | 18..65 , "a".."z" |
Class | Type checking | String , Integer , Hash |
Proc/Lambda | Custom conditions | ->(x) { x.even? } |
Custom Object | Complex predicates | Objects implementing === |
Error Types
Error | Cause | Solution |
---|---|---|
NoMethodError |
Calling methods on nil | Use safe navigation &. |
TypeError |
Wrong argument types | Validate input types |
RegexpError |
Invalid regex pattern | Test patterns before use |
ArgumentError |
Wrong parameter count | Check method signatures |
SystemStackError |
Infinite recursion | Add base cases to recursive filters |