CrackedRuby - Array Uniqueness

Overview

Ruby provides several methods for removing duplicate elements from arrays. The primary methods are Array#uniq and Array#uniq!, which remove duplicate elements based on element equality or custom block logic. Ruby determines uniqueness using the == method for element comparison, maintaining the first occurrence of each unique element while preserving original array order.

The uniq method returns a new array with duplicates removed, while uniq! modifies the original array in place. Both methods accept an optional block that defines custom uniqueness criteria.

numbers = [1, 2, 2, 3, 3, 3, 4]
numbers.uniq
# => [1, 2, 3, 4]

words = ['apple', 'APPLE', 'banana', 'BANANA']
words.uniq(&:downcase)
# => ['apple', 'banana']

Ruby also provides Array#| (union operator) which combines arrays while removing duplicates, and set operations through the Set class for more complex uniqueness requirements.

Basic Usage

The uniq method removes consecutive and non-consecutive duplicates from an array. Ruby compares elements using their == method, so objects with the same value but different object identities are considered equal.

# Basic deduplication
items = [1, 1, 2, 3, 2, 4, 3]
items.uniq
# => [1, 2, 3, 4]

# String deduplication
names = ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob']
names.uniq
# => ['Alice', 'Bob', 'Charlie']

# Mixed data types
mixed = [1, '1', 1, 2, '2', 2]
mixed.uniq
# => [1, '1', 2, '2']

The uniq! method modifies the original array and returns the array if changes were made, or nil if no duplicates were found.

numbers = [5, 5, 6, 7, 6, 8]
result = numbers.uniq!
# numbers is now [5, 6, 7, 8]
# result is [5, 6, 7, 8]

no_duplicates = [1, 2, 3, 4]
result = no_duplicates.uniq!
# no_duplicates remains [1, 2, 3, 4]
# result is nil

Both methods accept blocks for custom uniqueness logic. The block receives each element and should return a value used for comparison.

people = [
  { name: 'Alice', age: 30 },
  { name: 'Bob', age: 25 },
  { name: 'Alice', age: 35 },
  { name: 'Charlie', age: 30 }
]

# Remove duplicates by name
unique_names = people.uniq { |person| person[:name] }
# => [{name: 'Alice', age: 30}, {name: 'Bob', age: 25}, {name: 'Charlie', age: 30}]

# Remove duplicates by age
unique_ages = people.uniq { |person| person[:age] }
# => [{name: 'Alice', age: 30}, {name: 'Bob', age: 25}, {name: 'Alice', age: 35}]

Advanced Usage

Complex uniqueness scenarios often require sophisticated block logic or preprocessing. Ruby's block-based uniqueness allows for multi-attribute comparison, normalized comparison, and conditional uniqueness rules.

For multi-attribute uniqueness, return an array of values from the block:

products = [
  { name: 'Laptop', brand: 'Dell', price: 999 },
  { name: 'Laptop', brand: 'HP', price: 899 },
  { name: 'Laptop', brand: 'Dell', price: 1099 },
  { name: 'Mouse', brand: 'Dell', price: 25 },
  { name: 'Mouse', brand: 'HP', price: 30 }
]

# Unique by name AND brand combination
unique_products = products.uniq { |p| [p[:name], p[:brand]] }
# => [
#   {name: 'Laptop', brand: 'Dell', price: 999},
#   {name: 'Laptop', brand: 'HP', price: 899},
#   {name: 'Mouse', brand: 'Dell', price: 25},
#   {name: 'Mouse', brand: 'HP', price: 30}
# ]

Normalized uniqueness handles case sensitivity, whitespace, and other formatting differences:

emails = [
  ' alice@example.com ',
  'ALICE@EXAMPLE.COM',
  'bob@test.org',
  '  BOB@TEST.ORG  ',
  'charlie@demo.net'
]

# Normalize emails for comparison
unique_emails = emails.uniq { |email| email.strip.downcase }
# => [' alice@example.com ', 'bob@test.org', 'charlie@demo.net']

# Complex normalization with regex
phone_numbers = [
  '(555) 123-4567',
  '555-123-4567',
  '5551234567',
  '(555) 987-6543',
  '555.987.6543'
]

unique_phones = phone_numbers.uniq do |phone|
  phone.gsub(/\D/, '') # Remove all non-digits
end
# => ['(555) 123-4567', '(555) 987-6543']

Conditional uniqueness applies different rules based on element properties:

transactions = [
  { type: 'debit', amount: 100, account: 'checking' },
  { type: 'debit', amount: 100, account: 'savings' },
  { type: 'credit', amount: 100, account: 'checking' },
  { type: 'credit', amount: 100, account: 'checking' },
  { type: 'debit', amount: 200, account: 'checking' }
]

# For credits: unique by amount and account
# For debits: unique by amount only
unique_transactions = transactions.uniq do |txn|
  if txn[:type] == 'credit'
    [txn[:type], txn[:amount], txn[:account]]
  else
    [txn[:type], txn[:amount]]
  end
end
# => [
#   {type: 'debit', amount: 100, account: 'checking'},
#   {type: 'credit', amount: 100, account: 'checking'},
#   {type: 'debit', amount: 200, account: 'checking'}
# ]

Chaining uniqueness operations handles multiple deduplication steps:

log_entries = [
  'ERROR: Database connection failed at 10:30',
  'ERROR: database connection failed at 10:30',
  'INFO: User login successful',
  'ERROR: Database Connection Failed at 10:30',
  'WARN: Low disk space',
  'error: database connection failed at 10:30'
]

# First normalize case, then remove duplicates
normalized_logs = log_entries
  .map(&:downcase)
  .uniq
  .sort
# => ['error: database connection failed at 10:30', 'info: user login successful', 'warn: low disk space']

Performance & Memory

Array uniqueness operations have significant performance implications for large datasets. The uniq method has O(n) time complexity but requires additional memory for the new array, while uniq! modifies in place but still requires temporary storage for duplicate tracking.

Ruby uses a hash internally to track seen elements, making lookup operations efficient. However, memory usage increases with array size and element complexity:

# Memory-efficient approach for large arrays
large_array = (1..1_000_000).to_a.concat((1..500_000).to_a)

# uniq creates new array - doubles memory usage temporarily
unique_copy = large_array.uniq # Uses ~24MB additional memory

# uniq! modifies in place - more memory efficient
large_array.uniq! # Uses ~12MB additional memory for tracking

Custom blocks add computational overhead. Complex block logic multiplies processing time:

require 'benchmark'

data = Array.new(100_000) { |i| { id: i % 50_000, value: rand(1000) } }

Benchmark.bm(20) do |x|
  x.report('simple uniq:') do
    data.uniq { |item| item[:id] }
  end

  x.report('complex block:') do
    data.uniq { |item| [item[:id], item[:value] > 500 ? 'high' : 'low'] }
  end

  x.report('expensive operation:') do
    data.uniq { |item| item.to_s.hash }
  end
end

#                          user     system      total        real
# simple uniq:         0.125000   0.000000   0.125000 (  0.127234)
# complex block:       0.234000   0.000000   0.234000 (  0.235678)
# expensive operation: 0.445000   0.000000   0.445000 (  0.447890)

For large datasets with simple uniqueness requirements, consider preprocessing or alternative data structures:

# Set for O(1) insertion and lookup
require 'set'

large_numbers = Array.new(1_000_000) { rand(100_000) }

# Using Set - faster for repeated uniqueness checks
unique_set = large_numbers.to_set
# Convert back to array if needed
unique_array = unique_set.to_a

# Hash-based manual deduplication for custom logic
seen = {}
unique_items = []
complex_data.each do |item|
  key = custom_key_function(item)
  unless seen[key]
    seen[key] = true
    unique_items << item
  end
end

Memory-conscious strategies reduce peak memory usage:

# Process in chunks for very large datasets
def unique_in_chunks(array, chunk_size = 10_000)
  seen = Set.new
  unique_items = []

  array.each_slice(chunk_size) do |chunk|
    chunk.each do |item|
      unless seen.include?(item)
        seen.add(item)
        unique_items << item
      end
    end

    # Optional: periodic cleanup
    GC.start if seen.size % 50_000 == 0
  end

  unique_items
end

Common Pitfalls

Object identity versus equality creates the most frequent uniqueness confusion. Ruby uses == for comparison, not object identity (equal?):

# Same content, different objects - considered equal
str1 = String.new('hello')
str2 = String.new('hello')
[str1, str2].uniq
# => ['hello'] - only one element

# Object identity doesn't affect uniqueness
a = [1, 2]
b = [1, 2]
[a, b].uniq
# => [[1, 2]] - only one element because arrays have same content

Mutable objects can cause unexpected behavior when modified after uniqueness operations:

arrays = [[1, 2], [3, 4], [1, 2]]
unique_arrays = arrays.uniq
# => [[1, 2], [3, 4]]

# Modifying original affects the unique result
arrays[0] << 3
unique_arrays
# => [[1, 2, 3], [3, 4]] - first array is now modified

Frozen objects prevent this issue:

arrays = [[1, 2].freeze, [3, 4].freeze, [1, 2].freeze]
unique_arrays = arrays.uniq
arrays[0] << 3 # Raises FrozenError

Hash and complex object uniqueness depends on proper == and hash method implementations:

class Person
  attr_reader :name, :age

  def initialize(name, age)
    @name, @age = name, age
  end

  # Without proper == method
end

people = [
  Person.new('Alice', 30),
  Person.new('Alice', 30),
  Person.new('Bob', 25)
]

people.uniq.size
# => 3 - all considered different (object identity)

class Person
  # Add proper == method
  def ==(other)
    other.is_a?(Person) && name == other.name && age == other.age
  end

  # Add hash method for better performance
  def hash
    [name, age].hash
  end
end

people.uniq.size
# => 2 - Alice duplicates removed

Block return values must be comparable. Returning incomparable objects causes errors:

data = [1, 2, 3, 'a', 'b']

# This works - returns comparable values
data.uniq { |x| x.class }
# => [1, 'a']

# This fails - can't compare mixed types in some Ruby versions
begin
  data.uniq { |x| x.is_a?(Integer) ? x : x.upcase }
rescue ArgumentError => e
  puts "Error: #{e.message}"
end

Nil values in blocks require careful handling:

records = [
  { name: 'Alice', email: 'alice@example.com' },
  { name: 'Bob', email: nil },
  { name: 'Charlie', email: nil },
  { name: 'David', email: 'david@example.com' }
]

# Naive approach - nil values are equal
records.uniq { |r| r[:email] }
# => Only one record with nil email

# Better approach - handle nil explicitly
records.uniq { |r| r[:email] || "no_email_#{r[:name]}" }
# => Keeps all records with unique keys

Reference

Core Methods

Method	Parameters	Returns	Description
`Array#uniq`	`&block` (optional)	`Array`	Returns new array with duplicates removed
`Array#uniq!`	`&block` (optional)	`Array` or `nil`	Removes duplicates in place, returns array if modified or nil
`Array#\|`	`other_array`	`Array`	Returns union of arrays (removes duplicates)

Set Operations

Method	Parameters	Returns	Description
`Set.new`	`enumerable` (optional)	`Set`	Creates set with unique elements
`Set#to_a`	None	`Array`	Converts set to array
`Set#add`	`element`	`Set`	Adds element to set
`Set#include?`	`element`	`Boolean`	Checks if element exists

Block Usage Patterns

Pattern	Example	Use Case
Attribute-based	`array.uniq { \|x\| x.name }`	Remove duplicates by object attribute
Multi-attribute	`array.uniq { \|x\| [x.name, x.age] }`	Composite uniqueness key
Normalized	`array.uniq { \|x\| x.downcase }`	Case-insensitive uniqueness
Conditional	`array.uniq { \|x\| x.type == 'A' ? x.id : x.code }`	Different keys based on element

Performance Characteristics

Operation	Time Complexity	Space Complexity	Notes
`Array#uniq`	O(n)	O(n)	Creates new array
`Array#uniq!`	O(n)	O(n) temporary	Modifies in place
`Array#uniq` with block	O(n * block_time)	O(n)	Block evaluation adds overhead
`Set#new`	O(n)	O(n)	Fastest for repeated uniqueness checks

Common Return Values

Scenario	`uniq` Returns	`uniq!` Returns
Duplicates found	New array without duplicates	Modified original array
No duplicates	New array (copy of original)	`nil`
Empty array	Empty array	`nil`
Single element	Single-element array	`nil`

Error Conditions

Error Type	Cause	Example
`ArgumentError`	Block returns incomparable types	Mixed numeric/string comparisons
`NoMethodError`	Missing `==` method on custom objects	Custom classes without equality
`FrozenError`	Calling `uniq!` on frozen array	`[1,2,3].freeze.uniq!`

Memory Usage Guidelines

Array Size	Recommended Approach	Reasoning
< 1,000 elements	`Array#uniq`	Memory overhead negligible
1,000 - 100,000 elements	`Array#uniq!` or `Set`	Balance performance and memory
> 100,000 elements	Chunked processing or `Set`	Avoid memory pressure
Repeated operations	`Set` for storage	O(1) uniqueness checks