CrackedRuby - Searching and Filtering

Overview

Ruby provides extensive searching and filtering capabilities through the Enumerable module and specialized methods on core classes. These operations transform collections by selecting elements that match specific criteria or locate individual items based on conditions.

The primary filtering methods operate on arrays, hashes, and other enumerable objects. Array#select returns elements matching a condition, while Array#reject excludes matching elements. Array#find returns the first matching element, and Array#grep performs pattern-based searching. Hash filtering works similarly but operates on key-value pairs.

String searching uses methods like String#include? for substring detection and String#match for regular expression matching. The String#scan method extracts all pattern matches, while String#index locates pattern positions.

# Array filtering
numbers = [1, 2, 3, 4, 5, 6]
evens = numbers.select(&:even?)
# => [2, 4, 6]

# Hash filtering
users = { alice: 25, bob: 30, carol: 22 }
adults = users.select { |name, age| age >= 25 }
# => { alice: 25, bob: 30 }

# String searching
text = "Ruby programming language"
text.include?("program")
# => true

Ruby's filtering methods accept blocks for custom conditions or symbols for method calls. The &:method syntax converts symbols to blocks, creating concise filtering expressions. Regular expressions enable pattern-based searching across strings and collections.

Basic Usage

Array filtering forms the foundation of collection manipulation in Ruby. The select method creates new arrays containing elements that satisfy the given condition, while reject excludes matching elements. Both methods preserve the original array and return filtered copies.

# Basic select and reject
scores = [85, 92, 78, 96, 81, 89]
passing = scores.select { |score| score >= 80 }
# => [85, 92, 96, 81, 89]

failing = scores.reject { |score| score >= 80 }
# => [78]

# Using symbol to proc
words = ["apple", "banana", "cherry", "date"]
long_words = words.select { |word| word.length > 5 }
# => ["banana", "cherry"]

The find method returns the first element matching the condition, while find_all serves as an alias for select. When no match exists, find returns nil. The detect method provides identical functionality to find.

# Finding single elements
inventory = [
  { item: "laptop", price: 1200, stock: 5 },
  { item: "mouse", price: 25, stock: 15 },
  { item: "keyboard", price: 80, stock: 0 }
]

expensive_item = inventory.find { |product| product[:price] > 1000 }
# => { item: "laptop", price: 1200, stock: 5 }

out_of_stock = inventory.find { |product| product[:stock] == 0 }
# => { item: "keyboard", price: 80, stock: 0 }

Hash filtering operates on key-value pairs, allowing conditions based on keys, values, or both. The block receives two parameters representing the key and value. Filtered results maintain hash structure.

# Hash filtering by value
temperatures = {
  "New York" => 72,
  "Miami" => 89,
  "Chicago" => 65,
  "Phoenix" => 105
}

hot_cities = temperatures.select { |city, temp| temp > 80 }
# => { "Miami" => 89, "Phoenix" => 105 }

# Hash filtering by key
settings = {
  debug_mode: true,
  log_level: "info",
  cache_enabled: false,
  timeout: 30
}

boolean_settings = settings.select { |key, value| [true, false].include?(value) }
# => { debug_mode: true, cache_enabled: false }

Pattern-based searching uses the grep method with regular expressions or other pattern objects. This method filters collections based on pattern matching using the === operator.

# Pattern matching with grep
files = ["readme.txt", "config.yml", "data.json", "backup.tar.gz"]
text_files = files.grep(/\.(txt|md)$/)
# => ["readme.txt"]

# Numeric range matching
ages = [16, 21, 35, 42, 18, 29]
young_adults = ages.grep(18..25)
# => [21, 18]

Advanced Usage

Complex filtering scenarios require combining multiple conditions and chaining operations. Ruby's enumerable methods compose naturally, allowing sophisticated data processing pipelines. Method chaining applies successive transformations while maintaining readable code structure.

# Multi-condition filtering with chaining
sales_data = [
  { region: "North", product: "laptop", amount: 1500, quarter: "Q1" },
  { region: "South", product: "tablet", amount: 800, quarter: "Q1" },
  { region: "North", product: "phone", amount: 600, quarter: "Q2" },
  { region: "West", product: "laptop", amount: 1200, quarter: "Q2" },
  { region: "South", product: "laptop", amount: 1800, quarter: "Q1" }
]

high_value_north_laptops = sales_data
  .select { |sale| sale[:region] == "North" }
  .select { |sale| sale[:product] == "laptop" }
  .select { |sale| sale[:amount] > 1000 }
  .map { |sale| { quarter: sale[:quarter], amount: sale[:amount] } }
# => [{ quarter: "Q1", amount: 1500 }]

Custom predicate objects encapsulate complex filtering logic. Classes implementing the === method work seamlessly with grep and case statements. This approach separates filtering logic from data processing code.

# Custom predicate classes
class PriceRange
  def initialize(min, max)
    @min, @max = min, max
  end

  def ===(item)
    item.respond_to?(:price) && item.price.between?(@min, @max)
  end
end

class Product
  attr_reader :name, :price, :category

  def initialize(name, price, category)
    @name, @price, @category = name, price, category
  end
end

products = [
  Product.new("Laptop", 1200, "Electronics"),
  Product.new("Book", 25, "Education"),
  Product.new("Phone", 800, "Electronics")
]

mid_range = PriceRange.new(100, 1000)
affordable_products = products.grep(mid_range)
# => [Product with phone]

Lazy evaluation optimizes filtering performance for large datasets. The lazy method creates lazy enumerators that process elements on demand rather than generating intermediate arrays.

# Lazy evaluation for memory efficiency
def generate_numbers
  (1..Float::INFINITY).lazy
    .select(&:prime?)
    .select { |n| n.to_s.include?('7') }
    .first(10)
end

# This processes only the elements needed to find 10 results
prime_sevens = generate_numbers
# => First 10 prime numbers containing digit 7

Nested structure filtering requires recursive approaches or specialized libraries. Ruby handles nested arrays and hashes through recursive methods or flattening operations combined with filtering.

# Nested hash filtering
company_data = {
  departments: {
    engineering: {
      employees: [
        { name: "Alice", salary: 95000, level: "senior" },
        { name: "Bob", salary: 75000, level: "junior" }
      ]
    },
    sales: {
      employees: [
        { name: "Carol", salary: 85000, level: "senior" },
        { name: "Dave", salary: 65000, level: "junior" }
      ]
    }
  }
}

# Extract all senior employees across departments
def extract_senior_employees(data)
  data[:departments].flat_map do |dept_name, dept_data|
    dept_data[:employees].select { |emp| emp[:level] == "senior" }
  end
end

senior_staff = extract_senior_employees(company_data)
# => [{ name: "Alice", salary: 95000, level: "senior" },
#     { name: "Carol", salary: 85000, level: "senior" }]

Performance & Memory

Filtering operations create new collections by default, which impacts memory usage with large datasets. Understanding when methods allocate memory versus operating in-place helps optimize performance-critical applications.

Methods like select and reject always create new arrays, while their bang variants like select! and reject! modify collections in place. In-place operations save memory but mutate original data structures.

# Memory allocation comparison
large_array = (1..1_000_000).to_a

# Creates new array - high memory usage
filtered = large_array.select(&:even?)

# Modifies original array - lower memory usage
large_array.select!(&:even?)
# Original array now contains only even numbers

Lazy evaluation provides significant performance benefits when processing large collections or infinite sequences. Lazy enumerators process elements on demand, avoiding intermediate collection creation.

# Performance comparison: eager vs lazy
require 'benchmark'

data = (1..10_000_000).to_a

# Eager evaluation - processes all elements
Benchmark.measure do
  result = data.select { |n| n % 1000 == 0 }
              .map { |n| n * 2 }
              .first(100)
end
# => More memory and time consumed

# Lazy evaluation - processes only needed elements
Benchmark.measure do
  result = data.lazy
              .select { |n| n % 1000 == 0 }
              .map { |n| n * 2 }
              .first(100)
end
# => Less memory and time consumed

Hash filtering performance depends on the condition complexity and hash size. Filtering by keys generally performs better than value-based filtering since key access uses hash table lookups.

# Key-based filtering (faster)
large_hash = (1..100_000).map { |i| [i, "value_#{i}"] }.to_h
even_keys = large_hash.select { |k, v| k.even? }

# Value-based filtering (slower)
string_values = large_hash.select { |k, v| v.include?("5") }

Regular expression compilation affects search performance in loops. Compiling patterns outside loops and using string methods when possible improves performance.

# Inefficient - compiles regex repeatedly
texts = ["apple", "banana", "application", "grape"] * 1000
results = texts.select { |text| text.match(/app/) }

# Efficient - compiles regex once
pattern = /app/
results = texts.select { |text| text.match(pattern) }

# Most efficient for simple patterns
results = texts.select { |text| text.include?("app") }

Common Pitfalls

Truthy and falsy value handling causes frequent confusion in filtering operations. Ruby considers false and nil as falsy, while all other values including zero, empty strings, and empty arrays evaluate as truthy.

# Unexpected truthy behavior
values = [0, "", [], false, nil, "text", 42]

# This keeps 0, "", and [] because they're truthy
truthy_values = values.select { |v| v }
# => [0, "", [], "text", 42]

# Explicit nil checking required for different behavior
non_nil_values = values.select { |v| !v.nil? }
# => [0, "", [], false, "text", 42]

Mutation during iteration creates unpredictable behavior. Modifying collections while filtering can skip elements or cause infinite loops. Always work with copies when modification is necessary during iteration.

# Dangerous - modifying during iteration
numbers = [1, 2, 3, 4, 5]
numbers.each do |num|
  numbers.delete(num) if num.even?  # Skips elements
end
# => [1, 3, 5] but iteration behavior is undefined

# Safe approach - work with copy
numbers = [1, 2, 3, 4, 5]
numbers_copy = numbers.dup
numbers_copy.each { |num| numbers.delete(num) if num.even? }
# => [1, 3, 5] with predictable behavior

Block variable naming conflicts occur when nested blocks use the same parameter names. This shadowing behavior can cause confusing results in complex filtering operations.

# Block variable shadowing problem
data = [[1, 2], [3, 4], [5, 6]]

# Confusing - 'item' used in both levels
result = data.select do |item|
  item.any? { |item| item > 3 }  # Inner 'item' shadows outer
end

# Clear - different parameter names
result = data.select do |array|
  array.any? { |element| element > 3 }
end
# => [[3, 4], [5, 6]]

Regular expression anchoring mistakes lead to partial matching when full string matching is intended. The match method finds patterns anywhere in strings unless anchors specify boundaries.

# Unanchored regex matches partial strings
emails = ["user@domain.com", "not-an-email", "test@site.org"]
pattern = /\w+@\w+\.\w+/

# Matches partial strings unexpectedly
valid_emails = emails.select { |email| email.match(pattern) }
# => All three match because pattern doesn't require full string match

# Properly anchored for full string validation
anchored_pattern = /\A\w+@\w+\.\w+\z/
valid_emails = emails.select { |email| email.match(anchored_pattern) }
# => ["user@domain.com", "test@site.org"]

Method chaining with nil values breaks execution chains. When intermediate operations return nil, subsequent method calls raise NoMethodError exceptions.

# Nil handling in method chains
users = [
  { name: "Alice", profile: { age: 30 } },
  { name: "Bob", profile: nil },
  { name: "Carol", profile: { age: 25 } }
]

# This breaks when profile is nil
young_users = users.select do |user|
  user[:profile][:age] < 28  # NoMethodError when profile is nil
end

# Safe navigation with nil checking
young_users = users.select do |user|
  user[:profile]&.[](:age)&.< 28
end.compact

Reference

Array Filtering Methods

Method	Parameters	Returns	Description
`#select { block }`	Block condition	`Array`	Returns elements where block returns truthy
`#select!`	Block condition	`Array` or `nil`	Modifies array in place, returns nil if unchanged
`#reject { block }`	Block condition	`Array`	Returns elements where block returns falsy
`#reject!`	Block condition	`Array` or `nil`	Modifies array in place, returns nil if unchanged
`#find { block }`	Block condition	`Object` or `nil`	Returns first matching element
`#find_all { block }`	Block condition	`Array`	Alias for select
`#detect { block }`	Block condition	`Object` or `nil`	Alias for find
`#grep(pattern)`	Pattern object	`Array`	Returns elements matching pattern with ===
`#grep_v(pattern)`	Pattern object	`Array`	Returns elements not matching pattern

Hash Filtering Methods

Method	Parameters	Returns	Description
`#select { \|k,v\| block }`	Block with key, value	`Hash`	Returns key-value pairs where block returns truthy
`#reject { \|k,v\| block }`	Block with key, value	`Hash`	Returns key-value pairs where block returns falsy
`#filter { \|k,v\| block }`	Block with key, value	`Hash`	Alias for select
`#compact`	None	`Hash`	Removes nil values
`#slice(*keys)`	Key list	`Hash`	Returns hash with only specified keys
`#except(*keys)`	Key list	`Hash`	Returns hash without specified keys

String Searching Methods

Method	Parameters	Returns	Description
`#include?(substring)`	String or Regexp	`Boolean`	Tests if string contains pattern
`#match(pattern)`	Regexp, optional position	`MatchData` or `nil`	Returns match data for pattern
`#match?(pattern)`	Regexp, optional position	`Boolean`	Tests if string matches pattern
`#scan(pattern)`	Regexp	`Array`	Returns all pattern matches
`#index(pattern)`	String or Regexp	`Integer` or `nil`	Returns position of first match
`#rindex(pattern)`	String or Regexp	`Integer` or `nil`	Returns position of last match

Enumerable Module Methods

Method	Parameters	Returns	Description
`#find_index { block }`	Block condition	`Integer` or `nil`	Returns index of first matching element
`#count { block }`	Optional block	`Integer`	Returns count of matching elements
`#any? { block }`	Optional block	`Boolean`	Tests if any element matches condition
`#all? { block }`	Optional block	`Boolean`	Tests if all elements match condition
`#none? { block }`	Optional block	`Boolean`	Tests if no elements match condition
`#one? { block }`	Optional block	`Boolean`	Tests if exactly one element matches

Performance Characteristics

Operation	Time Complexity	Memory Usage	Notes
`Array#select`	O(n)	O(k) where k is result size	Creates new array
`Array#find`	O(n) average, O(1) best	O(1)	Stops at first match
`Hash#select`	O(n)	O(k) where k is result size	Creates new hash
`String#include?`	O(n*m) worst case	O(1)	Boyer-Moore optimization
`Regexp#match`	O(n) average	O(1)	Depends on pattern complexity
`Enumerable#grep`	O(n)	O(k) where k is result size	Uses === operator

Common Pattern Objects

Pattern Type	Usage	Example
Regular Expression	String pattern matching	`/[a-z]+@[a-z]+\.[a-z]+/`
Range	Numeric or comparable ranges	`18..65`, `"a".."z"`
Class	Type checking	`String`, `Integer`, `Hash`
Proc/Lambda	Custom conditions	`->(x) { x.even? }`
Custom Object	Complex predicates	Objects implementing `===`

Error Types

Error	Cause	Solution
`NoMethodError`	Calling methods on nil	Use safe navigation `&.`
`TypeError`	Wrong argument types	Validate input types
`RegexpError`	Invalid regex pattern	Test patterns before use
`ArgumentError`	Wrong parameter count	Check method signatures
`SystemStackError`	Infinite recursion	Add base cases to recursive filters