Overview
Ruby provides several methods for grouping and partitioning collections. The Enumerable
module includes group_by
for categorizing elements, partition
for binary splits, and chunk
family methods for consecutive grouping. These methods transform arrays and other enumerables into organized data structures.
The group_by
method returns a hash where keys represent categories and values contain arrays of matching elements. The partition
method splits collections into two arrays based on a boolean condition. The chunk
methods group consecutive elements that share characteristics.
numbers = [1, 2, 3, 4, 5, 6]
numbers.group_by(&:even?)
# => {false=>[1, 3, 5], true=>[2, 4, 6]}
numbers.partition(&:even?)
# => [[2, 4, 6], [1, 3, 5]]
numbers.chunk(&:even?).to_a
# => [[false, [1]], [true, [2]], [false, [3]], [true, [4]], [false, [5]], [true, [6]]]
Ruby also provides specialized counting methods like tally
for frequency analysis and count
for conditional counting. These methods integrate with Ruby's block syntax and symbol-to-proc conversions.
Basic Usage
The group_by
method accepts a block that determines grouping criteria. Ruby evaluates the block for each element and uses the return value as a hash key. Elements producing the same key value belong to the same group.
words = ["apple", "banana", "cherry", "apricot", "blueberry"]
words.group_by { |word| word[0] }
# => {"a"=>["apple", "apricot"], "b"=>["banana", "blueberry"], "c"=>["cherry"]}
# Using symbol-to-proc for method calls
words.group_by(&:length)
# => {5=>["apple"], 6=>["banana", "cherry"], 7=>["apricot"], 9=>["blueberry"]}
The partition
method divides collections into two arrays. The first array contains elements where the block returns a truthy value, the second contains elements where the block returns falsy.
grades = [85, 92, 78, 96, 73, 88, 95]
passing, failing = grades.partition { |grade| grade >= 80 }
# passing => [85, 92, 96, 88, 95]
# failing => [78, 73]
# Multiple assignment works naturally
high_scores, low_scores = grades.partition { |grade| grade >= 90 }
The chunk
method groups consecutive elements that produce the same block result. Unlike group_by
, chunk
maintains sequence order and creates separate groups for non-consecutive identical values.
data = [1, 1, 2, 2, 2, 1, 1, 3, 3]
data.chunk { |n| n }.to_a
# => [[1, [1, 1]], [2, [2, 2, 2]], [1, [1, 1]], [3, [3, 3]]]
# Chunking by even/odd creates alternating groups
[1, 3, 2, 4, 5, 7, 6, 8].chunk(&:even?).to_a
# => [[false, [1, 3]], [true, [2, 4]], [false, [5, 7]], [true, [6, 8]]]
The slice_when
method creates chunks at boundaries where the block returns true when comparing consecutive elements. The block receives two parameters representing adjacent elements.
numbers = [1, 2, 4, 5, 7, 10, 11, 12]
# Split when difference between consecutive numbers > 1
numbers.slice_when { |prev, curr| curr - prev > 1 }.to_a
# => [[1, 2], [4, 5], [7], [10, 11, 12]]
Advanced Usage
Complex grouping operations often require nested data structures or multiple criteria. The group_by
method combines with hash manipulation methods for sophisticated organization patterns.
class Student
attr_reader :name, :grade, :subject, :score
def initialize(name, grade, subject, score)
@name, @grade, @subject, @score = name, grade, subject, score
end
end
students = [
Student.new("Alice", 9, "Math", 95),
Student.new("Bob", 9, "Math", 87),
Student.new("Alice", 9, "Science", 92),
Student.new("Carol", 10, "Math", 78),
Student.new("Bob", 9, "Science", 84)
]
# Group by multiple criteria using array keys
by_grade_and_subject = students.group_by { |s| [s.grade, s.subject] }
# => {[9, "Math"]=>[Alice(95), Bob(87)], [9, "Science"]=>[Alice(92), Bob(84)], [10, "Math"]=>[Carol(78)]}
# Transform grouped results
by_grade_and_subject.transform_values do |student_list|
{
count: student_list.length,
average: student_list.sum(&:score) / student_list.length.to_f,
students: student_list.map(&:name)
}
end
The chunk_while
method groups consecutive elements as long as a condition remains true between adjacent pairs. This method provides more control than chunk
for sequence-based grouping.
# Group ascending sequences
data = [1, 2, 3, 1, 2, 4, 3, 2, 1]
ascending_sequences = data.chunk_while { |prev, curr| curr > prev }.to_a
# => [[1, 2, 3], [1, 2, 4], [3], [2], [1]]
# Group strings by first letter, maintaining consecutive runs
words = ["apple", "apricot", "banana", "blueberry", "cherry", "coconut"]
words.chunk_while { |prev, curr| prev[0] == curr[0] }.to_a
# => [["apple", "apricot"], ["banana", "blueberry"], ["cherry", "coconut"]]
Combining partitioning with further processing creates powerful data transformation pipelines. The partition
method integrates with multiple assignment and method chaining.
mixed_data = ["valid_email@domain.com", "invalid-email", "user@site.org", "bad_format", "admin@company.net"]
valid_emails, invalid_formats = mixed_data.partition { |item| item.include?("@") && item.include?(".") }
# Further process each partition
email_domains = valid_emails.map { |email| email.split("@").last }
error_logs = invalid_formats.map { |item| "Invalid format: #{item}" }
# => email_domains: ["domain.com", "site.org", "company.net"]
# => error_logs: ["Invalid format: invalid-email", "Invalid format: bad_format"]
Custom grouping logic often requires stateful processing. Ruby blocks can maintain state through closure variables for complex grouping scenarios.
# Group elements maintaining running totals
transactions = [100, -50, 75, -25, 200, -100, 50]
balance = 0
grouped_by_balance_range = transactions.group_by do |amount|
balance += amount
case balance
when 0..100 then :low
when 101..200 then :medium
else :high
end
end
# Groups transactions by account balance after each transaction
Performance & Memory
Grouping operations create new data structures, with memory usage proportional to collection size and group count. The group_by
method constructs a hash with arrays as values, while partition
creates exactly two arrays regardless of input size.
# Memory-efficient counting vs full grouping
large_dataset = (1..1_000_000).to_a
# Efficient: only stores counts
frequency_counts = large_dataset.tally
# => {1=>1, 2=>1, ..., 1000000=>1}
# Memory-intensive: stores all elements
grouped_elements = large_dataset.group_by(&:itself)
# Creates hash with 1,000,000 key-value pairs, each value an array with one element
The chunk
family methods process elements lazily when possible, but calling to_a
forces immediate evaluation. For large datasets, processing chunks individually avoids loading entire result sets into memory.
large_file_lines = File.foreach("large_file.txt")
# Memory-efficient: process one chunk at a time
large_file_lines.chunk { |line| line[0] }.each do |first_char, lines|
# Process lines starting with same character
puts "Processing #{lines.count} lines starting with '#{first_char}'"
# Only this chunk's lines are in memory
end
# Memory-intensive: materializes all chunks
all_chunks = large_file_lines.chunk { |line| line[0] }.to_a
# Loads entire file content into memory as grouped arrays
Block complexity significantly impacts performance. Simple operations like symbol-to-proc conversions execute faster than complex block logic with multiple method calls or calculations.
require 'benchmark'
data = (1..100_000).to_a
Benchmark.bm do |x|
x.report("symbol-to-proc:") { data.group_by(&:even?) }
x.report("simple block:") { data.group_by { |n| n.even? } }
x.report("complex block:") { data.group_by { |n| n % 3 == 0 ? :divisible : :remainder } }
end
# symbol-to-proc typically fastest, complex blocks slowest
Hash operations within grouping methods can create performance bottlenecks. Pre-computing hash keys or using simpler key types improves grouping speed for complex objects.
# Slow: expensive key computation per element
users.group_by { |user| "#{user.department}_#{user.role}_#{user.status}".downcase }
# Faster: pre-compute or cache expensive operations
users.group_by { |user| [user.department, user.role, user.status] }
# Array keys are cheaper than string concatenation
Common Pitfalls
Block evaluation occurs once per element for grouping methods. Side effects in blocks can produce unexpected results, especially when blocks modify external state or perform I/O operations.
counter = 0
data = [1, 2, 3, 4, 5]
# Dangerous: side effect in grouping block
grouped = data.group_by do |n|
counter += 1
n.even?
end
# counter is now 5, but this creates fragile coupling
# Better: separate concerns
data.group_by(&:even?).tap { counter = data.length }
Hash key equality determines grouping behavior. Objects that appear different might group together if they have equal hash codes, while objects that appear identical might group separately.
# Surprising behavior with string keys
data = ["hello", "hello".dup, "hello".freeze]
data.group_by(&:itself)
# => {"hello"=>["hello", "hello", "hello"]}
# All strings group together despite being different objects
# Custom objects with poor hash/equality implementation
class BadKey
def initialize(value)
@value = value
end
def hash
0 # Always returns same hash
end
end
items = [BadKey.new(1), BadKey.new(2), BadKey.new(3)]
items.group_by { |item| item }
# Grouping behavior depends on object_id, not logical equality
The chunk
method's consecutive grouping behavior often confuses developers expecting global grouping. Non-consecutive identical values create separate groups, unlike group_by
.
sequence = [1, 1, 2, 1, 1, 3, 3, 2, 2]
# chunk creates separate groups for non-consecutive identical values
sequence.chunk(&:itself).to_a
# => [[1, [1, 1]], [2, [2]], [1, [1, 1]], [3, [3, 3]], [2, [2, 2]]]
# group_by combines all identical values
sequence.group_by(&:itself)
# => {1=>[1, 1, 1, 1], 2=>[2, 2, 2], 3=>[3, 3]}
The partition
method always returns two arrays, even when one partition is empty. Destructuring assignment can mask empty partitions, leading to logic errors in code that assumes both partitions contain elements.
numbers = [2, 4, 6, 8] # All even
odd_numbers, even_numbers = numbers.partition(&:odd?)
# odd_numbers => []
# even_numbers => [2, 4, 6, 8]
# Dangerous: assumes odd_numbers is not empty
first_odd = odd_numbers.first # Returns nil
puts "First odd number: #{first_odd.to_i}" # Prints 0, might not be intended
# Better: check for empty partitions
if odd_numbers.any?
puts "First odd number: #{odd_numbers.first}"
else
puts "No odd numbers found"
end
Block parameters in slice methods can be confusing. The slice_when
method passes the previous element as the first parameter and current element as the second, which reverses some developers' intuitions.
data = [1, 3, 2, 4, 5]
# Incorrect parameter order assumption
data.slice_when { |curr, prev| curr > prev }.to_a
# This checks if previous > current, not current > previous
# Correct parameter usage
data.slice_when { |prev, curr| curr > prev }.to_a
# => [[1, 3], [2, 4, 5]]
Production Patterns
Web applications frequently use grouping for dashboard aggregations and report generation. Combining database queries with Ruby grouping methods creates efficient data processing pipelines.
# Rails controller aggregating user activity
class AnalyticsController < ApplicationController
def user_activity_report
activities = UserActivity.includes(:user)
.where(created_at: 30.days.ago..Time.current)
# Group by date for time series data
daily_activity = activities.group_by { |activity| activity.created_at.to_date }
.transform_values { |activities| activities.count }
# Group by user department for organizational insights
dept_activity = activities.group_by { |activity| activity.user.department }
.transform_values do |activities|
{
total_activities: activities.count,
unique_users: activities.map(&:user_id).uniq.count,
avg_per_user: activities.count / activities.map(&:user_id).uniq.count.to_f
}
end
render json: { daily: daily_activity, by_department: dept_activity }
end
end
Log processing and monitoring systems leverage chunking methods for batch processing and anomaly detection. The slice_when
method identifies boundaries in time-series data.
class LogProcessor
def process_error_bursts(log_entries)
# Group consecutive errors within 5 minutes as potential incidents
incidents = log_entries.slice_when do |prev_entry, curr_entry|
time_gap = curr_entry.timestamp - prev_entry.timestamp
time_gap > 5.minutes || prev_entry.level != :error || curr_entry.level != :error
end
error_incidents = incidents.select { |entries| entries.all? { |e| e.level == :error } }
.select { |entries| entries.count >= 3 }
# Alert on significant error bursts
error_incidents.each do |incident_entries|
AlertService.trigger_incident(
start_time: incident_entries.first.timestamp,
end_time: incident_entries.last.timestamp,
error_count: incident_entries.count,
affected_services: incident_entries.map(&:service).uniq
)
end
end
end
Data export and ETL processes use partitioning for parallel processing and resource management. Splitting work into manageable chunks prevents memory exhaustion and enables distributed processing.
class DataExporter
BATCH_SIZE = 10_000
def export_user_data(user_ids)
# Partition users for different export strategies
active_users, inactive_users = User.where(id: user_ids)
.partition { |user| user.last_login > 90.days.ago }
# Process active users with full data export
active_users.each_slice(BATCH_SIZE) do |user_batch|
ExportWorker.perform_async(user_batch.map(&:id), :full_export)
end
# Process inactive users with basic data export
inactive_users.each_slice(BATCH_SIZE) do |user_batch|
ExportWorker.perform_async(user_batch.map(&:id), :basic_export)
end
# Group by region for compliance requirements
regional_batches = active_users.group_by(&:region)
.transform_values { |users| users.each_slice(BATCH_SIZE).to_a }
regional_batches.each do |region, user_batches|
ComplianceExportWorker.perform_async(region, user_batches)
end
end
end
API response formatting often requires nested grouping structures. Multiple grouping operations transform flat data into hierarchical JSON responses.
class ProductAPI
def category_inventory_summary
products = Product.includes(:category, :variants).active
# Multi-level grouping for nested API response
summary = products.group_by(&:category)
.transform_values do |category_products|
{
total_products: category_products.count,
by_availability: category_products.group_by(&:availability_status)
.transform_values(&:count),
by_price_range: category_products.group_by { |p| price_range(p.price) }
.transform_values do |range_products|
{
count: range_products.count,
avg_price: range_products.sum(&:price) / range_products.count,
product_ids: range_products.map(&:id)
}
end
}
end
render json: summary
end
private
def price_range(price)
case price
when 0..25 then :budget
when 26..100 then :standard
when 101..500 then :premium
else :luxury
end
end
end
Reference
Core Grouping Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#group_by(&block) |
Block yielding grouping key | Hash |
Groups elements by block return values |
#partition(&block) |
Block yielding boolean | Array[Array, Array] |
Splits into two arrays based on block result |
#tally |
None | Hash |
Counts frequency of each unique element |
#count |
Value or block (optional) | Integer |
Counts elements matching criteria |
Consecutive Grouping Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#chunk(&block) |
Block yielding grouping key | Enumerator |
Groups consecutive elements by block result |
#chunk_while(&block) |
Block with two parameters | Enumerator |
Groups consecutive elements while block returns true |
#slice_when(&block) |
Block with two parameters | Enumerator |
Splits at boundaries where block returns true |
#slice_after(pattern) |
Pattern or block | Enumerator |
Splits after elements matching pattern |
#slice_before(pattern) |
Pattern or block | Enumerator |
Splits before elements matching pattern |
Hash Transformation Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#transform_values(&block) |
Block for value transformation | Hash |
Creates new hash with transformed values |
#transform_keys(&block) |
Block for key transformation | Hash |
Creates new hash with transformed keys |
#group_by.merge(other) |
Hash to merge | Hash |
Combines grouping results |
Common Block Patterns
Pattern | Example | Use Case |
---|---|---|
Symbol to Proc | &:method_name |
Simple method calls |
Attribute Access | { |obj| obj.attribute } |
Object property grouping |
Calculated Key | { |n| n / 10 } |
Mathematical grouping |
Multiple Criteria | { |obj| [obj.a, obj.b] } |
Compound grouping keys |
Conditional Logic | { |n| n > 0 ? :pos : :neg } |
Binary classification |
Return Value Types
Method Family | Immediate Result | After .to_a |
---|---|---|
group_by |
Hash{key => Array} |
N/A (already materialized) |
partition |
[Array, Array] |
N/A (already materialized) |
chunk methods |
Enumerator |
Array[Array[key, Array]] |
slice methods |
Enumerator |
Array[Array] |
Performance Characteristics
Operation | Time Complexity | Space Complexity | Notes |
---|---|---|---|
group_by |
O(n) | O(n) | Hash creation overhead |
partition |
O(n) | O(n) | Always creates two arrays |
chunk |
O(n) | O(1) lazy, O(n) materialized | Lazy by default |
tally |
O(n) | O(k) where k = unique elements | Memory-efficient counting |
Error Conditions
Scenario | Exception | Prevention |
---|---|---|
Block returns nil as key |
None (nil becomes valid key) | Check for nil in complex blocks |
Empty collection | None (returns empty hash/array) | Handle empty results appropriately |
Block raises exception | Block's exception propagates | Wrap block logic in error handling |
Infinite enumerable with to_a |
Memory exhaustion | Use lazy evaluation or limits |