CrackedRuby - Data vs Struct Comparison

Overview

Ruby provides two primary classes for creating simple data structures: Struct and Data. Both classes generate value objects with named attributes, but they serve different purposes and exhibit distinct behaviors. Struct creates mutable objects with optional method definitions, while Data produces immutable value objects focused on data integrity and functional programming patterns.

Struct has been part of Ruby since early versions, designed as a convenient way to create classes with named attributes and accessor methods. The Struct.new method returns a new class with the specified attributes, supporting both positional and keyword arguments for initialization.

Person = Struct.new(:name, :age)
person = Person.new("Alice", 30)
person.name = "Bob"  # Mutable
# => "Bob"

Data was introduced in Ruby 3.2 as an immutable alternative. Data objects cannot be modified after creation, making them suitable for functional programming patterns and situations requiring data integrity guarantees.

Person = Data.define(:name, :age)
person = Person.new(name: "Alice", age: 30)
person.with(name: "Bob")  # Returns new instance
# => #<data Person name="Bob", age=30>

The fundamental difference lies in mutability. Struct instances can be modified after creation, while Data instances are frozen and immutable. This affects memory usage, thread safety, and programming patterns. Data objects also provide built-in pattern matching support and more restrictive initialization semantics.

Both classes automatically generate accessor methods, equality comparisons, and hash methods. However, they differ in their approach to customization, inheritance, and method definition. Struct allows defining methods within the class definition block, while Data focuses on pure data representation with minimal behavior.

Basic Usage

Struct creation supports multiple initialization patterns. The most common approach defines attributes as symbols, creating a new class with accessor methods for each attribute.

# Basic struct definition
Point = Struct.new(:x, :y)
point = Point.new(10, 20)
point.x  # => 10
point.y = 30  # Modifies existing instance

# Keyword arguments
Person = Struct.new(:name, :age, keyword_init: true)
person = Person.new(name: "Carol", age: 25)
person.age = 26  # Direct modification

Data requires keyword arguments for initialization and provides a different creation syntax. The Data.define method creates an immutable class with the specified attributes.

# Basic data definition
Point = Data.define(:x, :y)
point = Point.new(x: 10, y: 20)
point.x  # => 10
# point.y = 30  # Raises FrozenError

# Creating modified copies
new_point = point.with(y: 30)
# => #<data Point x=10, y=30>

Both structures support destructuring and pattern matching, but Data provides enhanced pattern matching capabilities. Struct can be destructured using array-like syntax, while Data supports both array and hash-like destructuring.

# Struct destructuring
Point = Struct.new(:x, :y)
point = Point.new(5, 15)
x, y = point.to_a
# => [5, 15]

# Data pattern matching
Point = Data.define(:x, :y)
point = Point.new(x: 5, y: 15)
case point
in Point(x: 0, y:)
  puts "On Y axis: #{y}"
in Point(x:, y: 0)
  puts "On X axis: #{x}"
in Point(x:, y:)
  puts "Point at #{x}, #{y}"
end

Default values work differently between the two classes. Struct supports default values through initialization parameters, while Data handles defaults through the with method and careful initialization patterns.

# Struct with defaults
Config = Struct.new(:host, :port, :timeout) do
  def initialize(host: "localhost", port: 8080, timeout: 30)
    super(host, port, timeout)
  end
end

# Data with defaults
Config = Data.define(:host, :port, :timeout) do
  def self.default
    new(host: "localhost", port: 8080, timeout: 30)
  end
end

Advanced Usage

Both Struct and Data support method definition, but with different philosophies. Struct encourages adding behavior directly to the generated class, while Data promotes composition and functional patterns.

# Struct with custom methods
class Rectangle < Struct.new(:width, :height)
  def area
    width * height
  end
  
  def resize!(factor)
    self.width *= factor
    self.height *= factor
    self
  end
  
  def perimeter
    2 * (width + height)
  end
end

rect = Rectangle.new(10, 5)
rect.resize!(2)  # Modifies in place
rect.area  # => 100

Data classes focus on immutable transformations and functional composition. Method definitions typically return new instances rather than modifying existing ones.

Rectangle = Data.define(:width, :height) do
  def area
    width * height
  end
  
  def resize(factor)
    with(width: width * factor, height: height * factor)
  end
  
  def perimeter
    2 * (width + height)
  end
  
  def scale_to_area(target_area)
    factor = Math.sqrt(target_area.to_f / area)
    resize(factor)
  end
end

rect = Rectangle.new(width: 10, height: 5)
bigger_rect = rect.resize(2)  # Returns new instance
scaled_rect = rect.scale_to_area(200)

Inheritance patterns differ significantly. Struct supports classical inheritance with shared mutable state, while Data inheritance maintains immutability constraints across the hierarchy.

# Struct inheritance
Animal = Struct.new(:name, :species)
class Dog < Animal
  def initialize(name, breed)
    super(name, "dog")
    @breed = breed
  end
  
  attr_reader :breed
  
  def bark
    "#{name} says woof!"
  end
end

# Data inheritance
Animal = Data.define(:name, :species)
Dog = Data.define(:name, :breed) do
  def initialize(name:, breed:)
    super(name: name, species: "dog", breed: breed)
  end
  
  def bark
    "#{name} says woof!"
  end
end

Complex initialization and validation logic requires different approaches. Struct can modify instance variables during initialization, while Data must validate during creation since instances become immutable.

# Struct with validation
class EmailContact < Struct.new(:email, :name)
  def initialize(email, name = nil)
    raise ArgumentError, "Invalid email" unless email.include?("@")
    super
    normalize_email!
  end
  
  private
  
  def normalize_email!
    self.email = email.downcase.strip
  end
end

# Data with validation
EmailContact = Data.define(:email, :name) do
  def initialize(email:, name: nil)
    raise ArgumentError, "Invalid email" unless email.include?("@")
    super(email: email.downcase.strip, name: name)
  end
  
  def update_email(new_email)
    self.class.new(email: new_email, name: name)
  end
end

Performance & Memory

Memory usage patterns differ substantially between Struct and Data due to their mutability characteristics. Struct instances consume less memory initially but may require additional allocations when modified. Data instances are frozen and optimized for sharing but create new objects for each modification.

require 'benchmark/memory'

# Memory comparison for creation
Benchmark.memory do |x|
  Point = Struct.new(:x, :y)
  
  x.report("Struct creation") do
    1000.times { Point.new(rand(100), rand(100)) }
  end
  
  DataPoint = Data.define(:x, :y)
  
  x.report("Data creation") do
    1000.times { DataPoint.new(x: rand(100), y: rand(100)) }
  end
  
  x.compare!
end

Performance characteristics vary based on usage patterns. Struct excels at in-place modifications and scenarios requiring frequent updates. Data performs better in functional programming contexts with many intermediate values and sharing scenarios.

require 'benchmark'

# Performance comparison for modifications
Benchmark.bm do |x|
  struct_point = Struct.new(:x, :y).new(0, 0)
  data_point = Data.define(:x, :y).new(x: 0, y: 0)
  
  x.report("Struct mutation") do
    point = struct_point.dup
    1000.times do |i|
      point.x = i
      point.y = i * 2
    end
  end
  
  x.report("Data transformation") do
    point = data_point
    1000.times do |i|
      point = point.with(x: i, y: i * 2)
    end
  end
end

Hash and equality operations show different performance profiles. Data objects benefit from cached hash values and optimized equality checks, while Struct instances recalculate these values based on current attribute states.

# Hash performance comparison
struct_points = Array.new(1000) { Struct.new(:x, :y).new(rand(100), rand(100)) }
data_points = Array.new(1000) { Data.define(:x, :y).new(x: rand(100), y: rand(100)) }

Benchmark.bm do |x|
  x.report("Struct hash operations") do
    hash = {}
    struct_points.each { |point| hash[point] = true }
  end
  
  x.report("Data hash operations") do
    hash = {}
    data_points.each { |point| hash[point] = true }
  end
end

Memory sharing scenarios favor Data objects. Since they're immutable, multiple references to the same Data instance don't risk unexpected mutations. Struct instances require defensive copying in shared contexts.

# Memory sharing example
shared_config = Data.define(:host, :port, :ssl).new(
  host: "api.example.com",
  port: 443,
  ssl: true
)

# Safe to share across threads and contexts
clients = Array.new(10) do |i|
  # Each client can safely reference shared config
  { id: i, config: shared_config }
end

# Struct requires defensive copying
StructConfig = Struct.new(:host, :port, :ssl)
base_config = StructConfig.new("api.example.com", 443, true)

clients = Array.new(10) do |i|
  # Must duplicate to prevent accidental mutations
  { id: i, config: base_config.dup }
end

Common Pitfalls

Mutability assumptions cause frequent errors when switching between Struct and Data. Code expecting mutable behavior fails with Data objects, while functional code may not account for Struct mutations.

# Dangerous assumption with Data
def update_coordinates(point, x, y)
  point.x = x  # FrozenError with Data objects
  point.y = y
  point
end

# Correct approach for both
def update_coordinates(point, x, y)
  if point.respond_to?(:with)
    point.with(x: x, y: y)  # Data
  else
    point.dup.tap { |p| p.x = x; p.y = y }  # Struct
  end
end

Initialization syntax differences create subtle bugs. Struct accepts both positional and keyword arguments depending on configuration, while Data always requires keywords.

# Struct flexibility can hide bugs
Person = Struct.new(:name, :age)
person1 = Person.new("Alice", 30)      # Positional
person2 = Person.new(age: 25)          # Partial keyword
person3 = Person.new("Bob", age: 40)   # Mixed - dangerous!

# Data consistency
Person = Data.define(:name, :age)
person1 = Person.new(name: "Alice", age: 30)  # Always keywords
# person2 = Person.new("Bob", 25)  # ArgumentError

Pattern matching behavior varies between the classes. Data provides first-class pattern matching support, while Struct requires array-style destructuring or additional setup.

# Pattern matching pitfall
def process_point(point)
  case point
  in { x: 0, y: }  # Works with Data, not with Struct
    "On Y axis"
  in [0, y]        # Works with Struct, not with Data  
    "On Y axis"
  end
end

# Robust pattern matching
def process_point(point)
  case point
  when ->(p) { p.x == 0 }
    "On Y axis: #{point.y}"
  when ->(p) { p.y == 0 }
    "On X axis: #{point.x}"
  else
    "Point at #{point.x}, #{point.y}"
  end
end

Thread safety misconceptions occur frequently. While Data objects are immutable and thread-safe, Struct instances require synchronization for shared access.

# Thread safety pitfall with Struct
counter = Struct.new(:value).new(0)

threads = 10.times.map do
  Thread.new do
    1000.times { counter.value += 1 }  # Race condition
  end
end
threads.each(&:join)
# counter.value is unpredictable

# Data approach requires different pattern
Counter = Data.define(:value)
counter = Counter.new(value: 0)
mutex = Mutex.new

threads = 10.times.map do
  Thread.new do
    1000.times do
      mutex.synchronize do
        counter = counter.with(value: counter.value + 1)
      end
    end
  end
end

Serialization and deserialization behavior differs subtly. Both classes support standard Ruby serialization, but Data objects maintain their frozen state across serialization boundaries.

# Serialization behavior
struct_point = Struct.new(:x, :y).new(10, 20)
data_point = Data.define(:x, :y).new(x: 10, y: 20)

# Both serialize similarly
struct_yaml = YAML.dump(struct_point)
data_yaml = YAML.dump(data_point)

# But deserialize with different mutability
restored_struct = YAML.load(struct_yaml)
restored_data = YAML.load(data_yaml)

restored_struct.x = 30  # Works
# restored_data.x = 30  # FrozenError

Production Patterns

Web application contexts often require different approaches for Struct and Data usage. Data objects work well for configuration, request/response objects, and functional pipelines, while Struct fits mutable model attributes and builder patterns.

# API response modeling with Data
APIResponse = Data.define(:status, :data, :errors) do
  def success?
    status == 200 && errors.empty?
  end
  
  def with_error(error)
    with(errors: errors + [error])
  end
  
  def transform_data(&block)
    return self unless success?
    with(data: block.call(data))
  end
end

# Usage in Rails controller
class UsersController < ApplicationController
  def show
    user = User.find(params[:id])
    response = APIResponse.new(
      status: 200,
      data: user.as_json,
      errors: []
    )
    
    enriched = response
      .transform_data { |data| data.merge(preferences: user.preferences) }
      .transform_data { |data| data.merge(avatar_url: avatar_service.url_for(user)) }
    
    render json: enriched.data
  rescue ActiveRecord::RecordNotFound => e
    error_response = APIResponse.new(status: 404, data: nil, errors: [e.message])
    render json: error_response, status: 404
  end
end

Database integration patterns highlight the differences in approach. Struct objects can represent mutable active record attributes, while Data objects work better for value objects and immutable domain models.

# Struct for mutable database representations
class UserProfile < Struct.new(:user_id, :bio, :website, :location, keyword_init: true)
  def self.from_database(row)
    new(
      user_id: row['user_id'],
      bio: row['bio'],
      website: row['website'],
      location: row['location']
    )
  end
  
  def update_from_params(params)
    self.bio = params[:bio] if params.key?(:bio)
    self.website = params[:website] if params.key?(:website)
    self.location = params[:location] if params.key?(:location)
  end
  
  def to_database_hash
    { user_id: user_id, bio: bio, website: website, location: location }
  end
end

# Data for immutable domain models
Address = Data.define(:street, :city, :state, :zip_code) do
  def self.from_string(address_string)
    parts = address_string.split(', ')
    new(
      street: parts[0],
      city: parts[1],
      state: parts[2]&.split(' ')&.first,
      zip_code: parts[2]&.split(' ')&.last
    )
  end
  
  def formatted
    "#{street}, #{city}, #{state} #{zip_code}"
  end
  
  def in_state?(target_state)
    state.downcase == target_state.downcase
  end
end

Background job processing shows clear distinctions in usage patterns. Data objects excel as immutable job parameters, while Struct objects work well for mutable job state tracking.

# Data for immutable job parameters
EmailJob = Data.define(:recipient, :subject, :template, :variables) do
  def perform
    EmailService.send_email(
      to: recipient,
      subject: subject,
      body: TemplateRenderer.render(template, variables)
    )
  end
  
  def retry_with_delay(delay_seconds)
    with(variables: variables.merge(retry_delay: delay_seconds))
  end
end

# Struct for mutable job tracking
class JobStatus < Struct.new(:job_id, :status, :progress, :started_at, :completed_at, keyword_init: true)
  def start!
    self.status = 'running'
    self.started_at = Time.current
    save_to_redis
  end
  
  def update_progress!(percent)
    self.progress = percent
    save_to_redis
  end
  
  def complete!
    self.status = 'completed'
    self.progress = 100
    self.completed_at = Time.current
    save_to_redis
  end
  
  private
  
  def save_to_redis
    Redis.current.setex("job:#{job_id}", 3600, to_json)
  end
end

Caching strategies require different approaches. Data objects make excellent cache keys due to their immutability and hash consistency, while Struct objects need careful handling to avoid cache invalidation issues.

# Data objects as cache keys
UserPreferences = Data.define(:theme, :language, :timezone, :notifications) do
  def cache_key
    "preferences:#{hash}"
  end
  
  def self.cached_for_user(user_id)
    cache_key = "user_preferences:#{user_id}"
    Rails.cache.fetch(cache_key, expires_in: 1.hour) do
      # Load from database and return Data object
      row = Database.query("SELECT * FROM user_preferences WHERE user_id = ?", user_id).first
      new(
        theme: row['theme'],
        language: row['language'], 
        timezone: row['timezone'],
        notifications: JSON.parse(row['notifications'])
      )
    end
  end
end

# Struct requires cache invalidation management
class MutableUserPreferences < Struct.new(:user_id, :theme, :language, :timezone, :notifications, keyword_init: true)
  def save!
    Database.query("UPDATE user_preferences SET ... WHERE user_id = ?", user_id)
    invalidate_cache
  end
  
  def update_theme(new_theme)
    self.theme = new_theme
    save!
  end
  
  private
  
  def invalidate_cache
    Rails.cache.delete("user_preferences:#{user_id}")
  end
end

Reference

Class Creation Methods

Method	Parameters	Returns	Description
`Struct.new(*attrs, keyword_init: false, &block)`	`attrs` (Array), `keyword_init` (Boolean)	`Class`	Creates new Struct class with specified attributes
`Data.define(*attrs, &block)`	`attrs` (Array)	`Class`	Creates new Data class with specified attributes

Instance Creation

Method	Parameters	Returns	Description
`StructClass.new(*values)`	`values` (Array)	`StructInstance`	Creates struct instance with positional arguments
`StructClass.new(**kwargs)`	`kwargs` (Hash)	`StructInstance`	Creates struct instance with keyword arguments (if enabled)
`DataClass.new(**kwargs)`	`kwargs` (Hash)	`DataInstance`	Creates data instance with keyword arguments only

Instance Methods - Common

Method	Parameters	Returns	Description
`#to_a`	None	`Array`	Returns array of attribute values
`#to_h`	None	`Hash`	Returns hash of attribute name/value pairs
`#==`	`other` (Object)	`Boolean`	Compares objects by attribute values
`#eql?`	`other` (Object)	`Boolean`	Strict equality comparison
`#hash`	None	`Integer`	Returns hash value for object
`#inspect`	None	`String`	Returns string representation

Instance Methods - Struct Only

Method	Parameters	Returns	Description
`#[](name_or_index)`	`name_or_index` (Symbol/Integer)	`Object`	Gets attribute value by name or index
`#[]=(name_or_index, value)`	`name_or_index` (Symbol/Integer), `value` (Object)	`Object`	Sets attribute value by name or index
`#each`	`&block`	`Enumerator/self`	Iterates over attribute values
`#each_pair`	`&block`	`Enumerator/self`	Iterates over attribute name/value pairs
`#length`	None	`Integer`	Returns number of attributes
`#size`	None	`Integer`	Alias for length

Instance Methods - Data Only

Method	Parameters	Returns	Description
`#with(**kwargs)`	`kwargs` (Hash)	`DataInstance`	Returns new instance with updated attributes
`#deconstruct`	None	`Array`	Returns array for pattern matching
`#deconstruct_keys`	`keys` (Array)	`Hash`	Returns hash for pattern matching

Mutability Characteristics

Feature	Struct	Data
Attribute Modification	Mutable via `attr=` methods	Immutable, frozen after creation
In-place Updates	Supported	Not supported, raises `FrozenError`
Thread Safety	Requires synchronization	Thread-safe due to immutability
Memory Sharing	Requires defensive copying	Safe to share references
Functional Patterns	Requires explicit copying	Built-in via `#with` method

Initialization Patterns

Pattern	Struct	Data
Positional Args	`Point.new(x, y)`	Not supported
Keyword Args	`Point.new(x: 1, y: 2)` (if enabled)	`Point.new(x: 1, y: 2)` (required)
Mixed Args	Supported but discouraged	Not supported
Partial Init	Fills missing with `nil`	Requires all attributes
Default Values	Via custom `initialize`	Via factory methods or `#with`

Performance Characteristics

Operation	Struct	Data
Creation	Faster	Slightly slower (immutability setup)
Modification	In-place, very fast	Creates new instance, slower
Hash Operations	Recalculates hash	Cached hash value
Equality Checks	Standard comparison	Optimized for immutable data
Memory Usage	Lower per instance	Higher per instance, better sharing
GC Pressure	Lower for mutations	Higher for transformations

Pattern Matching Support

Feature	Struct	Data
Array Patterns	`in [x, y]`	`in [x, y]`
Hash Patterns	Limited support	`in {x:, y:}`
Deconstruction	Via `to_a`	Via `#deconstruct` and `#deconstruct_keys`
Variable Binding	Manual extraction	Automatic via pattern matching
Guard Clauses	External conditions	Integrated pattern support

Common Error Conditions

Error	Struct	Data
FrozenError	Only if explicitly frozen	Always raised on mutation attempts
ArgumentError	Wrong number of arguments	Missing required keywords
NoMethodError	Invalid attribute names	Invalid attribute names
TypeError	Type mismatches in custom logic	Type mismatches in custom logic

Data vs Struct Comparison