CrackedRuby logo

CrackedRuby

Searching and Filtering

A comprehensive guide to searching and filtering collections, strings, and data structures using Ruby's built-in methods and patterns.

Core Modules Enumerable Module
3.2.2

Overview

Ruby provides extensive searching and filtering capabilities through the Enumerable module and specialized methods on core classes. These operations transform collections by selecting elements that match specific criteria or locate individual items based on conditions.

The primary filtering methods operate on arrays, hashes, and other enumerable objects. Array#select returns elements matching a condition, while Array#reject excludes matching elements. Array#find returns the first matching element, and Array#grep performs pattern-based searching. Hash filtering works similarly but operates on key-value pairs.

String searching uses methods like String#include? for substring detection and String#match for regular expression matching. The String#scan method extracts all pattern matches, while String#index locates pattern positions.

# Array filtering
numbers = [1, 2, 3, 4, 5, 6]
evens = numbers.select(&:even?)
# => [2, 4, 6]

# Hash filtering  
users = { alice: 25, bob: 30, carol: 22 }
adults = users.select { |name, age| age >= 25 }
# => { alice: 25, bob: 30 }

# String searching
text = "Ruby programming language"
text.include?("program")
# => true

Ruby's filtering methods accept blocks for custom conditions or symbols for method calls. The &:method syntax converts symbols to blocks, creating concise filtering expressions. Regular expressions enable pattern-based searching across strings and collections.

Basic Usage

Array filtering forms the foundation of collection manipulation in Ruby. The select method creates new arrays containing elements that satisfy the given condition, while reject excludes matching elements. Both methods preserve the original array and return filtered copies.

# Basic select and reject
scores = [85, 92, 78, 96, 81, 89]
passing = scores.select { |score| score >= 80 }
# => [85, 92, 96, 81, 89]

failing = scores.reject { |score| score >= 80 }
# => [78]

# Using symbol to proc
words = ["apple", "banana", "cherry", "date"]
long_words = words.select { |word| word.length > 5 }
# => ["banana", "cherry"]

The find method returns the first element matching the condition, while find_all serves as an alias for select. When no match exists, find returns nil. The detect method provides identical functionality to find.

# Finding single elements
inventory = [
  { item: "laptop", price: 1200, stock: 5 },
  { item: "mouse", price: 25, stock: 15 },
  { item: "keyboard", price: 80, stock: 0 }
]

expensive_item = inventory.find { |product| product[:price] > 1000 }
# => { item: "laptop", price: 1200, stock: 5 }

out_of_stock = inventory.find { |product| product[:stock] == 0 }
# => { item: "keyboard", price: 80, stock: 0 }

Hash filtering operates on key-value pairs, allowing conditions based on keys, values, or both. The block receives two parameters representing the key and value. Filtered results maintain hash structure.

# Hash filtering by value
temperatures = { 
  "New York" => 72, 
  "Miami" => 89, 
  "Chicago" => 65, 
  "Phoenix" => 105 
}

hot_cities = temperatures.select { |city, temp| temp > 80 }
# => { "Miami" => 89, "Phoenix" => 105 }

# Hash filtering by key
settings = { 
  debug_mode: true, 
  log_level: "info", 
  cache_enabled: false,
  timeout: 30 
}

boolean_settings = settings.select { |key, value| [true, false].include?(value) }
# => { debug_mode: true, cache_enabled: false }

Pattern-based searching uses the grep method with regular expressions or other pattern objects. This method filters collections based on pattern matching using the === operator.

# Pattern matching with grep
files = ["readme.txt", "config.yml", "data.json", "backup.tar.gz"]
text_files = files.grep(/\.(txt|md)$/)
# => ["readme.txt"]

# Numeric range matching
ages = [16, 21, 35, 42, 18, 29]
young_adults = ages.grep(18..25)
# => [21, 18]

Advanced Usage

Complex filtering scenarios require combining multiple conditions and chaining operations. Ruby's enumerable methods compose naturally, allowing sophisticated data processing pipelines. Method chaining applies successive transformations while maintaining readable code structure.

# Multi-condition filtering with chaining
sales_data = [
  { region: "North", product: "laptop", amount: 1500, quarter: "Q1" },
  { region: "South", product: "tablet", amount: 800, quarter: "Q1" },
  { region: "North", product: "phone", amount: 600, quarter: "Q2" },
  { region: "West", product: "laptop", amount: 1200, quarter: "Q2" },
  { region: "South", product: "laptop", amount: 1800, quarter: "Q1" }
]

high_value_north_laptops = sales_data
  .select { |sale| sale[:region] == "North" }
  .select { |sale| sale[:product] == "laptop" }
  .select { |sale| sale[:amount] > 1000 }
  .map { |sale| { quarter: sale[:quarter], amount: sale[:amount] } }
# => [{ quarter: "Q1", amount: 1500 }]

Custom predicate objects encapsulate complex filtering logic. Classes implementing the === method work seamlessly with grep and case statements. This approach separates filtering logic from data processing code.

# Custom predicate classes
class PriceRange
  def initialize(min, max)
    @min, @max = min, max
  end
  
  def ===(item)
    item.respond_to?(:price) && item.price.between?(@min, @max)
  end
end

class Product
  attr_reader :name, :price, :category
  
  def initialize(name, price, category)
    @name, @price, @category = name, price, category
  end
end

products = [
  Product.new("Laptop", 1200, "Electronics"),
  Product.new("Book", 25, "Education"),
  Product.new("Phone", 800, "Electronics")
]

mid_range = PriceRange.new(100, 1000)
affordable_products = products.grep(mid_range)
# => [Product with phone]

Lazy evaluation optimizes filtering performance for large datasets. The lazy method creates lazy enumerators that process elements on demand rather than generating intermediate arrays.

# Lazy evaluation for memory efficiency  
def generate_numbers
  (1..Float::INFINITY).lazy
    .select(&:prime?)
    .select { |n| n.to_s.include?('7') }
    .first(10)
end

# This processes only the elements needed to find 10 results
prime_sevens = generate_numbers
# => First 10 prime numbers containing digit 7

Nested structure filtering requires recursive approaches or specialized libraries. Ruby handles nested arrays and hashes through recursive methods or flattening operations combined with filtering.

# Nested hash filtering
company_data = {
  departments: {
    engineering: {
      employees: [
        { name: "Alice", salary: 95000, level: "senior" },
        { name: "Bob", salary: 75000, level: "junior" }
      ]
    },
    sales: {
      employees: [
        { name: "Carol", salary: 85000, level: "senior" },
        { name: "Dave", salary: 65000, level: "junior" }
      ]
    }
  }
}

# Extract all senior employees across departments
def extract_senior_employees(data)
  data[:departments].flat_map do |dept_name, dept_data|
    dept_data[:employees].select { |emp| emp[:level] == "senior" }
  end
end

senior_staff = extract_senior_employees(company_data)
# => [{ name: "Alice", salary: 95000, level: "senior" }, 
#     { name: "Carol", salary: 85000, level: "senior" }]

Performance & Memory

Filtering operations create new collections by default, which impacts memory usage with large datasets. Understanding when methods allocate memory versus operating in-place helps optimize performance-critical applications.

Methods like select and reject always create new arrays, while their bang variants like select! and reject! modify collections in place. In-place operations save memory but mutate original data structures.

# Memory allocation comparison
large_array = (1..1_000_000).to_a

# Creates new array - high memory usage
filtered = large_array.select(&:even?)

# Modifies original array - lower memory usage
large_array.select!(&:even?)
# Original array now contains only even numbers

Lazy evaluation provides significant performance benefits when processing large collections or infinite sequences. Lazy enumerators process elements on demand, avoiding intermediate collection creation.

# Performance comparison: eager vs lazy
require 'benchmark'

data = (1..10_000_000).to_a

# Eager evaluation - processes all elements
Benchmark.measure do
  result = data.select { |n| n % 1000 == 0 }
              .map { |n| n * 2 }
              .first(100)
end
# => More memory and time consumed

# Lazy evaluation - processes only needed elements
Benchmark.measure do
  result = data.lazy
              .select { |n| n % 1000 == 0 }
              .map { |n| n * 2 }
              .first(100)
end
# => Less memory and time consumed

Hash filtering performance depends on the condition complexity and hash size. Filtering by keys generally performs better than value-based filtering since key access uses hash table lookups.

# Key-based filtering (faster)
large_hash = (1..100_000).map { |i| [i, "value_#{i}"] }.to_h
even_keys = large_hash.select { |k, v| k.even? }

# Value-based filtering (slower)  
string_values = large_hash.select { |k, v| v.include?("5") }

Regular expression compilation affects search performance in loops. Compiling patterns outside loops and using string methods when possible improves performance.

# Inefficient - compiles regex repeatedly
texts = ["apple", "banana", "application", "grape"] * 1000
results = texts.select { |text| text.match(/app/) }

# Efficient - compiles regex once
pattern = /app/
results = texts.select { |text| text.match(pattern) }

# Most efficient for simple patterns
results = texts.select { |text| text.include?("app") }

Common Pitfalls

Truthy and falsy value handling causes frequent confusion in filtering operations. Ruby considers false and nil as falsy, while all other values including zero, empty strings, and empty arrays evaluate as truthy.

# Unexpected truthy behavior
values = [0, "", [], false, nil, "text", 42]

# This keeps 0, "", and [] because they're truthy
truthy_values = values.select { |v| v }
# => [0, "", [], "text", 42]

# Explicit nil checking required for different behavior
non_nil_values = values.select { |v| !v.nil? }
# => [0, "", [], false, "text", 42]

Mutation during iteration creates unpredictable behavior. Modifying collections while filtering can skip elements or cause infinite loops. Always work with copies when modification is necessary during iteration.

# Dangerous - modifying during iteration
numbers = [1, 2, 3, 4, 5]
numbers.each do |num|
  numbers.delete(num) if num.even?  # Skips elements
end
# => [1, 3, 5] but iteration behavior is undefined

# Safe approach - work with copy
numbers = [1, 2, 3, 4, 5]
numbers_copy = numbers.dup
numbers_copy.each { |num| numbers.delete(num) if num.even? }
# => [1, 3, 5] with predictable behavior

Block variable naming conflicts occur when nested blocks use the same parameter names. This shadowing behavior can cause confusing results in complex filtering operations.

# Block variable shadowing problem
data = [[1, 2], [3, 4], [5, 6]]

# Confusing - 'item' used in both levels
result = data.select do |item|
  item.any? { |item| item > 3 }  # Inner 'item' shadows outer
end

# Clear - different parameter names
result = data.select do |array|
  array.any? { |element| element > 3 }
end
# => [[3, 4], [5, 6]]

Regular expression anchoring mistakes lead to partial matching when full string matching is intended. The match method finds patterns anywhere in strings unless anchors specify boundaries.

# Unanchored regex matches partial strings
emails = ["user@domain.com", "not-an-email", "test@site.org"]
pattern = /\w+@\w+\.\w+/

# Matches partial strings unexpectedly
valid_emails = emails.select { |email| email.match(pattern) }
# => All three match because pattern doesn't require full string match

# Properly anchored for full string validation
anchored_pattern = /\A\w+@\w+\.\w+\z/
valid_emails = emails.select { |email| email.match(anchored_pattern) }
# => ["user@domain.com", "test@site.org"]

Method chaining with nil values breaks execution chains. When intermediate operations return nil, subsequent method calls raise NoMethodError exceptions.

# Nil handling in method chains
users = [
  { name: "Alice", profile: { age: 30 } },
  { name: "Bob", profile: nil },
  { name: "Carol", profile: { age: 25 } }
]

# This breaks when profile is nil
young_users = users.select do |user|
  user[:profile][:age] < 28  # NoMethodError when profile is nil
end

# Safe navigation with nil checking
young_users = users.select do |user|
  user[:profile]&.[](:age)&.< 28
end.compact

Reference

Array Filtering Methods

Method Parameters Returns Description
#select { block } Block condition Array Returns elements where block returns truthy
#select! Block condition Array or nil Modifies array in place, returns nil if unchanged
#reject { block } Block condition Array Returns elements where block returns falsy
#reject! Block condition Array or nil Modifies array in place, returns nil if unchanged
#find { block } Block condition Object or nil Returns first matching element
#find_all { block } Block condition Array Alias for select
#detect { block } Block condition Object or nil Alias for find
#grep(pattern) Pattern object Array Returns elements matching pattern with ===
#grep_v(pattern) Pattern object Array Returns elements not matching pattern

Hash Filtering Methods

Method Parameters Returns Description
`#select { k,v block }` Block with key, value
`#reject { k,v block }` Block with key, value
`#filter { k,v block }` Block with key, value
#compact None Hash Removes nil values
#slice(*keys) Key list Hash Returns hash with only specified keys
#except(*keys) Key list Hash Returns hash without specified keys

String Searching Methods

Method Parameters Returns Description
#include?(substring) String or Regexp Boolean Tests if string contains pattern
#match(pattern) Regexp, optional position MatchData or nil Returns match data for pattern
#match?(pattern) Regexp, optional position Boolean Tests if string matches pattern
#scan(pattern) Regexp Array Returns all pattern matches
#index(pattern) String or Regexp Integer or nil Returns position of first match
#rindex(pattern) String or Regexp Integer or nil Returns position of last match

Enumerable Module Methods

Method Parameters Returns Description
#find_index { block } Block condition Integer or nil Returns index of first matching element
#count { block } Optional block Integer Returns count of matching elements
#any? { block } Optional block Boolean Tests if any element matches condition
#all? { block } Optional block Boolean Tests if all elements match condition
#none? { block } Optional block Boolean Tests if no elements match condition
#one? { block } Optional block Boolean Tests if exactly one element matches

Performance Characteristics

Operation Time Complexity Memory Usage Notes
Array#select O(n) O(k) where k is result size Creates new array
Array#find O(n) average, O(1) best O(1) Stops at first match
Hash#select O(n) O(k) where k is result size Creates new hash
String#include? O(n*m) worst case O(1) Boyer-Moore optimization
Regexp#match O(n) average O(1) Depends on pattern complexity
Enumerable#grep O(n) O(k) where k is result size Uses === operator

Common Pattern Objects

Pattern Type Usage Example
Regular Expression String pattern matching /[a-z]+@[a-z]+\.[a-z]+/
Range Numeric or comparable ranges 18..65, "a".."z"
Class Type checking String, Integer, Hash
Proc/Lambda Custom conditions ->(x) { x.even? }
Custom Object Complex predicates Objects implementing ===

Error Types

Error Cause Solution
NoMethodError Calling methods on nil Use safe navigation &.
TypeError Wrong argument types Validate input types
RegexpError Invalid regex pattern Test patterns before use
ArgumentError Wrong parameter count Check method signatures
SystemStackError Infinite recursion Add base cases to recursive filters