Overview
Ruby provides two classes for managing collections of unique elements: Set
and SortedSet
. Both classes ensure that each element appears exactly once in the collection, but they differ in their internal organization and performance characteristics.
Set
stores elements in an unordered fashion using a hash-based implementation, providing O(1) average-case performance for insertion, deletion, and membership testing. SortedSet
maintains elements in sorted order using a red-black tree structure, resulting in O(log n) performance for these operations but guaranteeing ordered iteration.
require 'set'
# Basic Set usage
numbers = Set.new([1, 2, 3, 2, 1])
numbers.to_a # => [1, 2, 3]
# Basic SortedSet usage
sorted_numbers = SortedSet.new([3, 1, 2, 1])
sorted_numbers.to_a # => [1, 2, 3]
Both classes inherit from Enumerable
, providing access to methods like map
, select
, and reject
. They implement set theory operations including union, intersection, and difference through operators like +
, &
, and -
.
The primary difference lies in ordering guarantees: Set
makes no promises about element order during iteration, while SortedSet
always iterates in sorted order according to the natural comparison of elements or a custom comparison block.
# Set iteration order is undefined
Set.new([3, 1, 2]).each { |n| print n } # Could output: 132, 213, etc.
# SortedSet iteration order is always sorted
SortedSet.new([3, 1, 2]).each { |n| print n } # Always outputs: 123
Both classes require elements to implement appropriate comparison methods. For Set
, elements must implement hash
and eql?
. For SortedSet
, elements must implement the spaceship operator <=>
for ordering comparisons.
Basic Usage
Creating sets requires loading the set library, as these classes are not part of Ruby's core classes. The most common initialization patterns involve passing an enumerable object to the constructor or using class methods.
require 'set'
# Creating empty sets
empty_set = Set.new
empty_sorted = SortedSet.new
# Creating from arrays
names = Set.new(['Alice', 'Bob', 'Charlie', 'Alice'])
names.size # => 3
# Creating from other enumerables
range_set = Set.new(1..5)
range_set.to_a # => [1, 2, 3, 4, 5]
# Using Set[] class method
colors = Set['red', 'green', 'blue']
Adding and removing elements uses familiar methods that mirror array operations. The key difference is that duplicate additions have no effect, and the methods return the set itself for chaining.
fruits = Set.new
fruits.add('apple')
fruits << 'banana' # Alias for add
fruits.add('apple') # No effect, already present
fruits.size # => 2
# Chaining operations
fruits.add('cherry').add('date').delete('banana')
fruits.to_a # => ['apple', 'cherry', 'date']
# Bulk operations
fruits.merge(['elderberry', 'fig', 'apple'])
fruits.subtract(['date', 'grape']) # No error if 'grape' absent
Set theory operations form the core of set functionality. Ruby implements these through intuitive operators and method names that correspond to mathematical notation.
set1 = Set[1, 2, 3, 4]
set2 = Set[3, 4, 5, 6]
# Union (combines all elements)
union = set1 | set2 # => Set[1, 2, 3, 4, 5, 6]
union = set1.union(set2) # Same result
# Intersection (common elements only)
intersection = set1 & set2 # => Set[3, 4]
intersection = set1.intersection(set2) # Same result
# Difference (elements in first but not second)
difference = set1 - set2 # => Set[1, 2]
difference = set1.difference(set2) # Same result
# Symmetric difference (elements in either but not both)
symmetric = set1 ^ set2 # => Set[1, 2, 5, 6]
Membership testing and subset operations provide boolean results for set relationships. These operations are fundamental for set-based algorithms and filtering operations.
languages = Set['ruby', 'python', 'javascript']
# Membership testing
languages.include?('ruby') # => true
languages.member?('python') # => true (alias)
languages.include?('golang') # => false
# Subset relationships
small_set = Set['ruby', 'python']
languages.superset?(small_set) # => true
small_set.subset?(languages) # => true
# Proper subset (strict subset)
languages.proper_superset?(small_set) # => true
small_set.proper_subset?(languages) # => true
# Disjoint sets (no common elements)
web_langs = Set['javascript', 'typescript']
system_langs = Set['c', 'rust']
web_langs.disjoint?(system_langs) # => true
Performance & Memory
The performance characteristics of Set
and SortedSet
differ significantly due to their underlying implementations. Understanding these differences helps choose the appropriate class for specific use cases and performance requirements.
Set
uses a hash table implementation that provides O(1) average-case performance for insertion, deletion, and lookup operations. This makes it ideal for scenarios requiring fast membership testing or frequent modifications to the collection.
require 'benchmark'
require 'set'
# Benchmark Set operations
large_array = (1..100_000).to_a
set = Set.new
Benchmark.bm(15) do |x|
x.report('Set creation:') do
Set.new(large_array)
end
x.report('Set insertion:') do
large_array.each { |n| set.add(n) }
end
x.report('Set lookup:') do
10_000.times { set.include?(rand(100_000)) }
end
end
SortedSet
maintains elements in sorted order using a red-black tree, resulting in O(log n) performance for basic operations. While slower than Set
for individual operations, it provides guaranteed ordering and efficient range operations.
# Memory usage comparison
array = (1..10_000).to_a.shuffle
set = Set.new(array)
sorted_set = SortedSet.new(array)
# Memory footprint (approximate)
# Array: ~40KB (plus object overhead)
# Set: ~80KB (hash table overhead)
# SortedSet: ~120KB (tree node overhead)
The performance gap becomes more pronounced with larger datasets. For collections with frequent insertions and deletions, Set
significantly outperforms SortedSet
. However, if ordered iteration is required, the cost of sorting a Set
often exceeds the overhead of maintaining a SortedSet
.
# Scenario: Building a leaderboard
scores = []
1000.times { scores << rand(1000) }
# Using Set (requires sorting for display)
score_set = Set.new
scores.each { |score| score_set.add(score) }
sorted_scores = score_set.to_a.sort # O(n log n) sort
# Using SortedSet (always sorted)
score_sorted_set = SortedSet.new
scores.each { |score| score_sorted_set.add(score) }
sorted_scores = score_sorted_set.to_a # Already sorted
Memory usage patterns also differ between the implementations. Set
uses less memory per element but may have hash table overhead and potential bucket collisions. SortedSet
uses more memory per element due to tree node pointers but provides predictable memory access patterns.
For applications processing large datasets, consider the access patterns when choosing between implementations. If the application primarily needs membership testing with occasional ordered output, maintain a Set
and sort when needed. If the application requires frequent ordered iteration or range queries, SortedSet
provides better overall performance.
# Memory-conscious approach for large datasets
class EfficientUniqueProcessor
def initialize(need_ordering: false)
@container = need_ordering ? SortedSet.new : Set.new
@stats = { insertions: 0, lookups: 0 }
end
def add_item(item)
@stats[:insertions] += 1
@container.add(item)
end
def include?(item)
@stats[:lookups] += 1
@container.include?(item)
end
def ordered_items
@container.respond_to?(:to_a) ? @container.to_a : @container.to_a.sort
end
end
Common Pitfalls
Set implementations in Ruby contain several subtle behaviors that can lead to unexpected results. Understanding these pitfalls prevents common bugs and helps write more robust code using sets.
Equality semantics represent the most frequent source of confusion. Sets determine element uniqueness using eql?
and hash
methods, not the ==
operator. This can lead to surprising behavior when working with custom objects or numeric types.
# Numeric type pitfall
mixed_numbers = Set.new
mixed_numbers.add(1) # Integer
mixed_numbers.add(1.0) # Float
mixed_numbers.add(1/1r) # Rational
# All three are present because they have different hash values
mixed_numbers.size # => 3
mixed_numbers.to_a # => [1, 1.0, 1/1]
# But they compare as equal
1 == 1.0 # => true
1 == 1/1r # => true
Custom objects must implement both hash
and eql?
consistently. Objects that are equal according to eql?
must return the same hash value, but objects with the same hash value need not be equal.
class Person
attr_reader :name, :age
def initialize(name, age)
@name = name
@age = age
end
# Incorrect implementation - only implements ==
def ==(other)
other.is_a?(Person) && name == other.name && age == other.age
end
end
people = Set.new
people.add(Person.new('Alice', 30))
people.add(Person.new('Alice', 30)) # Adds duplicate!
people.size # => 2 (unexpected)
# Correct implementation
class PersonFixed
attr_reader :name, :age
def initialize(name, age)
@name = name
@age = age
end
def eql?(other)
other.is_a?(PersonFixed) && name == other.name && age == other.age
end
def hash
[name, age].hash
end
end
Mutability of set elements creates another common trap. Modifying objects after adding them to a set can break the set's internal consistency, leading to elements that cannot be found or removed.
# Dangerous pattern - modifying elements after insertion
users = Set.new
user = { name: 'Bob', role: 'admin' }
users.add(user)
# Modifying the object breaks set consistency
user[:role] = 'user' # Changes hash value
users.include?(user) # => false (can't find it!)
users.delete(user) # => nil (can't remove it!)
# The set is now corrupted
users.size # => 1 (element still there)
users.to_a.first # => { name: 'Bob', role: 'user' }
SortedSet
introduces additional complexity around comparison consistency. Elements must implement <=>
and maintain consistent ordering throughout their lifetime. Changing an object's sort key after insertion corrupts the tree structure.
class Score
attr_accessor :points
def initialize(points)
@points = points
end
def <=>(other)
points <=> other.points
end
def eql?(other)
other.is_a?(Score) && points == other.points
end
def hash
points.hash
end
end
scores = SortedSet.new
score1 = Score.new(100)
score2 = Score.new(200)
scores.add(score1)
scores.add(score2)
scores.to_a.map(&:points) # => [100, 200]
# Dangerous - modifying sort key corrupts the tree
score1.points = 300
scores.to_a.map(&:points) # => [300, 200] (wrong order!)
# Tree structure is now inconsistent
scores.include?(Score.new(300)) # => false (can't find equivalent)
Set operations can produce unexpected results when working with sets of different types or when assuming commutativity. While mathematical set operations are commutative, Ruby's implementation may preserve the type of the receiver.
set1 = Set[1, 2, 3]
set2 = SortedSet[2, 3, 4]
result1 = set1 | set2 # Returns Set
result2 = set2 | set1 # Returns SortedSet
# Both contain the same elements but have different types
result1.class # => Set
result2.class # => SortedSet
# This affects subsequent operations and iteration order
result1.to_a # => [1, 2, 3, 4] (undefined order)
result2.to_a # => [1, 2, 3, 4] (sorted order)
Reference
Set Class Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Set.new(enum = nil) |
enum (Enumerable) |
Set |
Creates new set from enumerable or empty set |
Set[](*elements) |
Variable arguments | Set |
Creates set from argument list |
Set Instance Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#add(element) |
element (Object) |
Set |
Adds element to set, returns self |
#<<(element) |
element (Object) |
Set |
Alias for add |
#delete(element) |
element (Object) |
Set |
Removes element from set, returns self |
#delete?(element) |
element (Object) |
Object or nil |
Removes and returns element, nil if absent |
#include?(element) |
element (Object) |
Boolean |
Tests membership |
#member?(element) |
element (Object) |
Boolean |
Alias for include? |
#size |
None | Integer |
Returns number of elements |
#length |
None | Integer |
Alias for size |
#empty? |
None | Boolean |
Returns true if set contains no elements |
#clear |
None | Set |
Removes all elements, returns self |
Set Theory Operations
Method | Parameters | Returns | Description |
---|---|---|---|
#union(set) |
set (Set) |
Set |
Returns union of sets |
#|(set) |
set (Set) |
Set |
Operator alias for union |
#+(set) |
set (Set) |
Set |
Alias for union |
#intersection(set) |
set (Set) |
Set |
Returns intersection of sets |
#&(set) |
set (Set) |
Set |
Operator alias for intersection |
#difference(set) |
set (Set) |
Set |
Returns difference (elements in receiver not in argument) |
#-(set) |
set (Set) |
Set |
Operator alias for difference |
#^(set) |
set (Set) |
Set |
Returns symmetric difference |
Set Relationship Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#subset?(set) |
set (Set) |
Boolean |
True if receiver is subset of argument |
#superset?(set) |
set (Set) |
Boolean |
True if receiver is superset of argument |
#proper_subset?(set) |
set (Set) |
Boolean |
True if receiver is proper subset of argument |
#proper_superset?(set) |
set (Set) |
Boolean |
True if receiver is proper superset of argument |
#disjoint?(set) |
set (Set) |
Boolean |
True if sets share no common elements |
Set Modification Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#merge(enum) |
enum (Enumerable) |
Set |
Adds all elements from enumerable, returns self |
#subtract(enum) |
enum (Enumerable) |
Set |
Removes all elements in enumerable, returns self |
#replace(enum) |
enum (Enumerable) |
Set |
Replaces contents with elements from enumerable |
#keep_if |
Block | Set |
Removes elements where block returns falsy |
#select! |
Block | Set or nil |
Removes elements where block returns falsy, returns nil if unchanged |
#delete_if |
Block | Set |
Removes elements where block returns truthy |
#reject! |
Block | Set or nil |
Removes elements where block returns truthy, returns nil if unchanged |
SortedSet Class Methods
Method | Parameters | Returns | Description |
---|---|---|---|
SortedSet.new(enum = nil, &block) |
enum (Enumerable), block (Proc) |
SortedSet |
Creates sorted set with optional comparison block |
SortedSet[](*elements) |
Variable arguments | SortedSet |
Creates sorted set from argument list |
SortedSet-Specific Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#min |
None | Object |
Returns minimum element |
#max |
None | Object |
Returns maximum element |
Common Enumerable Methods
Both Set
and SortedSet
include Enumerable
, providing access to these methods:
Method | Parameters | Returns | Description |
---|---|---|---|
#each |
Block | Set |
Iterates over elements |
#map |
Block | Array |
Returns array of transformed elements |
#select |
Block | Array |
Returns array of elements where block returns truthy |
#reject |
Block | Array |
Returns array of elements where block returns falsy |
#find |
Block | Object or nil |
Returns first element where block returns truthy |
#to_a |
None | Array |
Converts set to array |
Element Requirements
For Set:
- Elements must implement
hash
method returning consistent integer - Elements must implement
eql?
method for equality testing - Objects that are
eql?
must return identical hash values
For SortedSet:
- Elements must implement
<=>
method returning -1, 0, or 1 - Elements must implement
eql?
andhash
(inherits Set requirements) - Comparison must be consistent throughout object lifetime
Performance Characteristics
Operation | Set | SortedSet |
---|---|---|
Insertion | O(1) average | O(log n) |
Deletion | O(1) average | O(log n) |
Lookup | O(1) average | O(log n) |
Iteration | O(n) unordered | O(n) sorted |
Union | O(n + m) | O(n + m) |
Intersection | O(min(n, m)) | O(n + m) |