Overview
Ruby provides several methods for removing duplicate elements from arrays. The primary methods are Array#uniq
and Array#uniq!
, which remove duplicate elements based on element equality or custom block logic. Ruby determines uniqueness using the ==
method for element comparison, maintaining the first occurrence of each unique element while preserving original array order.
The uniq
method returns a new array with duplicates removed, while uniq!
modifies the original array in place. Both methods accept an optional block that defines custom uniqueness criteria.
numbers = [1, 2, 2, 3, 3, 3, 4]
numbers.uniq
# => [1, 2, 3, 4]
words = ['apple', 'APPLE', 'banana', 'BANANA']
words.uniq(&:downcase)
# => ['apple', 'banana']
Ruby also provides Array#|
(union operator) which combines arrays while removing duplicates, and set operations through the Set class for more complex uniqueness requirements.
Basic Usage
The uniq
method removes consecutive and non-consecutive duplicates from an array. Ruby compares elements using their ==
method, so objects with the same value but different object identities are considered equal.
# Basic deduplication
items = [1, 1, 2, 3, 2, 4, 3]
items.uniq
# => [1, 2, 3, 4]
# String deduplication
names = ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob']
names.uniq
# => ['Alice', 'Bob', 'Charlie']
# Mixed data types
mixed = [1, '1', 1, 2, '2', 2]
mixed.uniq
# => [1, '1', 2, '2']
The uniq!
method modifies the original array and returns the array if changes were made, or nil
if no duplicates were found.
numbers = [5, 5, 6, 7, 6, 8]
result = numbers.uniq!
# numbers is now [5, 6, 7, 8]
# result is [5, 6, 7, 8]
no_duplicates = [1, 2, 3, 4]
result = no_duplicates.uniq!
# no_duplicates remains [1, 2, 3, 4]
# result is nil
Both methods accept blocks for custom uniqueness logic. The block receives each element and should return a value used for comparison.
people = [
{ name: 'Alice', age: 30 },
{ name: 'Bob', age: 25 },
{ name: 'Alice', age: 35 },
{ name: 'Charlie', age: 30 }
]
# Remove duplicates by name
unique_names = people.uniq { |person| person[:name] }
# => [{name: 'Alice', age: 30}, {name: 'Bob', age: 25}, {name: 'Charlie', age: 30}]
# Remove duplicates by age
unique_ages = people.uniq { |person| person[:age] }
# => [{name: 'Alice', age: 30}, {name: 'Bob', age: 25}, {name: 'Alice', age: 35}]
Advanced Usage
Complex uniqueness scenarios often require sophisticated block logic or preprocessing. Ruby's block-based uniqueness allows for multi-attribute comparison, normalized comparison, and conditional uniqueness rules.
For multi-attribute uniqueness, return an array of values from the block:
products = [
{ name: 'Laptop', brand: 'Dell', price: 999 },
{ name: 'Laptop', brand: 'HP', price: 899 },
{ name: 'Laptop', brand: 'Dell', price: 1099 },
{ name: 'Mouse', brand: 'Dell', price: 25 },
{ name: 'Mouse', brand: 'HP', price: 30 }
]
# Unique by name AND brand combination
unique_products = products.uniq { |p| [p[:name], p[:brand]] }
# => [
# {name: 'Laptop', brand: 'Dell', price: 999},
# {name: 'Laptop', brand: 'HP', price: 899},
# {name: 'Mouse', brand: 'Dell', price: 25},
# {name: 'Mouse', brand: 'HP', price: 30}
# ]
Normalized uniqueness handles case sensitivity, whitespace, and other formatting differences:
emails = [
' alice@example.com ',
'ALICE@EXAMPLE.COM',
'bob@test.org',
' BOB@TEST.ORG ',
'charlie@demo.net'
]
# Normalize emails for comparison
unique_emails = emails.uniq { |email| email.strip.downcase }
# => [' alice@example.com ', 'bob@test.org', 'charlie@demo.net']
# Complex normalization with regex
phone_numbers = [
'(555) 123-4567',
'555-123-4567',
'5551234567',
'(555) 987-6543',
'555.987.6543'
]
unique_phones = phone_numbers.uniq do |phone|
phone.gsub(/\D/, '') # Remove all non-digits
end
# => ['(555) 123-4567', '(555) 987-6543']
Conditional uniqueness applies different rules based on element properties:
transactions = [
{ type: 'debit', amount: 100, account: 'checking' },
{ type: 'debit', amount: 100, account: 'savings' },
{ type: 'credit', amount: 100, account: 'checking' },
{ type: 'credit', amount: 100, account: 'checking' },
{ type: 'debit', amount: 200, account: 'checking' }
]
# For credits: unique by amount and account
# For debits: unique by amount only
unique_transactions = transactions.uniq do |txn|
if txn[:type] == 'credit'
[txn[:type], txn[:amount], txn[:account]]
else
[txn[:type], txn[:amount]]
end
end
# => [
# {type: 'debit', amount: 100, account: 'checking'},
# {type: 'credit', amount: 100, account: 'checking'},
# {type: 'debit', amount: 200, account: 'checking'}
# ]
Chaining uniqueness operations handles multiple deduplication steps:
log_entries = [
'ERROR: Database connection failed at 10:30',
'ERROR: database connection failed at 10:30',
'INFO: User login successful',
'ERROR: Database Connection Failed at 10:30',
'WARN: Low disk space',
'error: database connection failed at 10:30'
]
# First normalize case, then remove duplicates
normalized_logs = log_entries
.map(&:downcase)
.uniq
.sort
# => ['error: database connection failed at 10:30', 'info: user login successful', 'warn: low disk space']
Performance & Memory
Array uniqueness operations have significant performance implications for large datasets. The uniq
method has O(n) time complexity but requires additional memory for the new array, while uniq!
modifies in place but still requires temporary storage for duplicate tracking.
Ruby uses a hash internally to track seen elements, making lookup operations efficient. However, memory usage increases with array size and element complexity:
# Memory-efficient approach for large arrays
large_array = (1..1_000_000).to_a.concat((1..500_000).to_a)
# uniq creates new array - doubles memory usage temporarily
unique_copy = large_array.uniq # Uses ~24MB additional memory
# uniq! modifies in place - more memory efficient
large_array.uniq! # Uses ~12MB additional memory for tracking
Custom blocks add computational overhead. Complex block logic multiplies processing time:
require 'benchmark'
data = Array.new(100_000) { |i| { id: i % 50_000, value: rand(1000) } }
Benchmark.bm(20) do |x|
x.report('simple uniq:') do
data.uniq { |item| item[:id] }
end
x.report('complex block:') do
data.uniq { |item| [item[:id], item[:value] > 500 ? 'high' : 'low'] }
end
x.report('expensive operation:') do
data.uniq { |item| item.to_s.hash }
end
end
# user system total real
# simple uniq: 0.125000 0.000000 0.125000 ( 0.127234)
# complex block: 0.234000 0.000000 0.234000 ( 0.235678)
# expensive operation: 0.445000 0.000000 0.445000 ( 0.447890)
For large datasets with simple uniqueness requirements, consider preprocessing or alternative data structures:
# Set for O(1) insertion and lookup
require 'set'
large_numbers = Array.new(1_000_000) { rand(100_000) }
# Using Set - faster for repeated uniqueness checks
unique_set = large_numbers.to_set
# Convert back to array if needed
unique_array = unique_set.to_a
# Hash-based manual deduplication for custom logic
seen = {}
unique_items = []
complex_data.each do |item|
key = custom_key_function(item)
unless seen[key]
seen[key] = true
unique_items << item
end
end
Memory-conscious strategies reduce peak memory usage:
# Process in chunks for very large datasets
def unique_in_chunks(array, chunk_size = 10_000)
seen = Set.new
unique_items = []
array.each_slice(chunk_size) do |chunk|
chunk.each do |item|
unless seen.include?(item)
seen.add(item)
unique_items << item
end
end
# Optional: periodic cleanup
GC.start if seen.size % 50_000 == 0
end
unique_items
end
Common Pitfalls
Object identity versus equality creates the most frequent uniqueness confusion. Ruby uses ==
for comparison, not object identity (equal?
):
# Same content, different objects - considered equal
str1 = String.new('hello')
str2 = String.new('hello')
[str1, str2].uniq
# => ['hello'] - only one element
# Object identity doesn't affect uniqueness
a = [1, 2]
b = [1, 2]
[a, b].uniq
# => [[1, 2]] - only one element because arrays have same content
Mutable objects can cause unexpected behavior when modified after uniqueness operations:
arrays = [[1, 2], [3, 4], [1, 2]]
unique_arrays = arrays.uniq
# => [[1, 2], [3, 4]]
# Modifying original affects the unique result
arrays[0] << 3
unique_arrays
# => [[1, 2, 3], [3, 4]] - first array is now modified
Frozen objects prevent this issue:
arrays = [[1, 2].freeze, [3, 4].freeze, [1, 2].freeze]
unique_arrays = arrays.uniq
arrays[0] << 3 # Raises FrozenError
Hash and complex object uniqueness depends on proper ==
and hash
method implementations:
class Person
attr_reader :name, :age
def initialize(name, age)
@name, @age = name, age
end
# Without proper == method
end
people = [
Person.new('Alice', 30),
Person.new('Alice', 30),
Person.new('Bob', 25)
]
people.uniq.size
# => 3 - all considered different (object identity)
class Person
# Add proper == method
def ==(other)
other.is_a?(Person) && name == other.name && age == other.age
end
# Add hash method for better performance
def hash
[name, age].hash
end
end
people.uniq.size
# => 2 - Alice duplicates removed
Block return values must be comparable. Returning incomparable objects causes errors:
data = [1, 2, 3, 'a', 'b']
# This works - returns comparable values
data.uniq { |x| x.class }
# => [1, 'a']
# This fails - can't compare mixed types in some Ruby versions
begin
data.uniq { |x| x.is_a?(Integer) ? x : x.upcase }
rescue ArgumentError => e
puts "Error: #{e.message}"
end
Nil values in blocks require careful handling:
records = [
{ name: 'Alice', email: 'alice@example.com' },
{ name: 'Bob', email: nil },
{ name: 'Charlie', email: nil },
{ name: 'David', email: 'david@example.com' }
]
# Naive approach - nil values are equal
records.uniq { |r| r[:email] }
# => Only one record with nil email
# Better approach - handle nil explicitly
records.uniq { |r| r[:email] || "no_email_#{r[:name]}" }
# => Keeps all records with unique keys
Reference
Core Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Array#uniq |
&block (optional) |
Array |
Returns new array with duplicates removed |
Array#uniq! |
&block (optional) |
Array or nil |
Removes duplicates in place, returns array if modified or nil |
`Array# | ` | other_array |
Array |
Set Operations
Method | Parameters | Returns | Description |
---|---|---|---|
Set.new |
enumerable (optional) |
Set |
Creates set with unique elements |
Set#to_a |
None | Array |
Converts set to array |
Set#add |
element |
Set |
Adds element to set |
Set#include? |
element |
Boolean |
Checks if element exists |
Block Usage Patterns
Pattern | Example | Use Case |
---|---|---|
Attribute-based | array.uniq { |x| x.name } |
Remove duplicates by object attribute |
Multi-attribute | array.uniq { |x| [x.name, x.age] } |
Composite uniqueness key |
Normalized | array.uniq { |x| x.downcase } |
Case-insensitive uniqueness |
Conditional | array.uniq { |x| x.type == 'A' ? x.id : x.code } |
Different keys based on element |
Performance Characteristics
Operation | Time Complexity | Space Complexity | Notes |
---|---|---|---|
Array#uniq |
O(n) | O(n) | Creates new array |
Array#uniq! |
O(n) | O(n) temporary | Modifies in place |
Array#uniq with block |
O(n * block_time) | O(n) | Block evaluation adds overhead |
Set#new |
O(n) | O(n) | Fastest for repeated uniqueness checks |
Common Return Values
Scenario | uniq Returns |
uniq! Returns |
---|---|---|
Duplicates found | New array without duplicates | Modified original array |
No duplicates | New array (copy of original) | nil |
Empty array | Empty array | nil |
Single element | Single-element array | nil |
Error Conditions
Error Type | Cause | Example |
---|---|---|
ArgumentError |
Block returns incomparable types | Mixed numeric/string comparisons |
NoMethodError |
Missing == method on custom objects |
Custom classes without equality |
FrozenError |
Calling uniq! on frozen array |
[1,2,3].freeze.uniq! |
Memory Usage Guidelines
Array Size | Recommended Approach | Reasoning |
---|---|---|
< 1,000 elements | Array#uniq |
Memory overhead negligible |
1,000 - 100,000 elements | Array#uniq! or Set |
Balance performance and memory |
> 100,000 elements | Chunked processing or Set |
Avoid memory pressure |
Repeated operations | Set for storage |
O(1) uniqueness checks |