CrackedRuby CrackedRuby

Relational Database Concepts

Overview

Relational databases organize data into tables (relations) consisting of rows (tuples) and columns (attributes). Each table represents an entity type, and relationships between entities are established through foreign keys that reference primary keys in other tables. The relational model, introduced by Edgar F. Codd in 1970, provides a mathematical foundation for data organization and manipulation.

The relational model separates logical data structure from physical storage, allowing databases to optimize storage and retrieval independently of application code. This abstraction enables multiple applications to access the same data through a standardized query language (SQL) while the database management system (DBMS) handles concurrency, transactions, and data integrity.

Relational databases enforce schemas that define table structures, data types, constraints, and relationships before data insertion. This schema-first approach contrasts with schema-less NoSQL databases. The schema serves as a contract between the database and applications, preventing invalid data states and maintaining referential integrity.

# Conceptual table structure in Ruby/ActiveRecord
class User < ApplicationRecord
  has_many :orders
  validates :email, presence: true, uniqueness: true
end

class Order < ApplicationRecord
  belongs_to :user
  has_many :order_items
end

The relational model provides declarative querying through SQL, where queries specify what data to retrieve rather than how to retrieve it. The DBMS query optimizer determines the most efficient execution plan based on available indexes, table statistics, and join strategies.

Key Principles

Tables and Relations: A table represents a relation, consisting of a heading (column definitions) and a body (rows of data). Each column has a defined data type and constraints. A row must have a value for every column, though that value may be NULL if permitted. Tables must have at least one column that uniquely identifies each row, called a primary key.

# Table definition using ActiveRecord migration
class CreateUsers < ActiveRecord::Migration[7.0]
  def change
    create_table :users do |t|
      t.string :email, null: false
      t.string :encrypted_password, null: false
      t.datetime :last_login_at
      t.timestamps
    end
    
    add_index :users, :email, unique: true
  end
end

Primary Keys: A primary key uniquely identifies each row in a table. Primary keys cannot contain NULL values and must be unique across all rows. Composite primary keys use multiple columns to form uniqueness. The choice between natural keys (business data) and surrogate keys (generated identifiers) affects schema design and query patterns.

Foreign Keys: Foreign keys establish relationships between tables by referencing another table's primary key. The database enforces referential integrity, preventing orphaned records and maintaining consistency across related tables. Foreign key constraints specify behavior for cascading updates and deletes.

# Foreign key relationship with referential integrity
class CreateOrders < ActiveRecord::Migration[7.0]
  def change
    create_table :orders do |t|
      t.references :user, null: false, foreign_key: true
      t.decimal :total, precision: 10, scale: 2
      t.string :status, default: 'pending'
      t.timestamps
    end
  end
end

ACID Properties: Relational databases guarantee ACID properties for transactions. Atomicity ensures all operations within a transaction complete or none do. Consistency maintains database constraints before and after transactions. Isolation prevents concurrent transactions from interfering with each other. Durability guarantees committed transactions persist even after system failures.

Normalization: Database normalization organizes tables to reduce redundancy and dependency issues. First normal form (1NF) requires atomic values in each column. Second normal form (2NF) eliminates partial dependencies on composite keys. Third normal form (3NF) removes transitive dependencies. Boyce-Codd normal form (BCNF) handles specific anomalies in 3NF. Higher normal forms address multi-valued and join dependencies.

Joins: Queries combine data from multiple tables through joins. Inner joins return rows with matching values in both tables. Left outer joins return all rows from the left table plus matching rows from the right. Right outer joins do the opposite. Full outer joins return all rows from both tables. Cross joins produce the Cartesian product of both tables.

Indexes: Indexes accelerate data retrieval by maintaining sorted data structures that point to table rows. B-tree indexes support range queries and sorting. Hash indexes optimize exact-match lookups. Composite indexes cover queries filtering multiple columns. Covering indexes include all columns needed by a query, avoiding table access entirely.

Ruby Implementation

Ruby applications interact with relational databases primarily through Object-Relational Mapping (ORM) libraries that translate between database tables and Ruby objects. ActiveRecord, the ORM included with Rails, provides the most widely-used implementation. Sequel offers an alternative with different design philosophy and features.

ActiveRecord Connections: ActiveRecord manages database connections through connection pools. Each thread acquires a connection from the pool when executing queries and releases it after completion. Configuration specifies database adapter, connection parameters, and pool size.

# Database configuration in database.yml
default: &default
  adapter: postgresql
  encoding: unicode
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  <<: *default
  database: app_development

# Establishing connection in code
ActiveRecord::Base.establish_connection(
  adapter: 'postgresql',
  host: 'localhost',
  database: 'production_db',
  username: 'app_user',
  password: ENV['DB_PASSWORD'],
  pool: 25,
  checkout_timeout: 5
)

Model Definitions: Models inherit from ActiveRecord::Base and map to database tables by convention. Table names default to the pluralized, underscored model name. Models define associations, validations, and business logic encapsulating database operations.

class Article < ApplicationRecord
  belongs_to :author, class_name: 'User'
  has_many :comments, dependent: :destroy
  has_many :tags, through: :article_tags
  
  validates :title, presence: true, length: { minimum: 5 }
  validates :author, presence: true
  
  scope :published, -> { where(published: true) }
  scope :recent, -> { order(created_at: :desc).limit(10) }
  
  def publish!
    update!(published: true, published_at: Time.current)
  end
end

Query Interface: ActiveRecord provides a fluent query interface that generates SQL. Queries return ActiveRecord::Relation objects that compose lazily until enumeration. This allows building complex queries incrementally while the ORM optimizes generated SQL.

# Building queries incrementally
articles = Article.published
articles = articles.where('created_at > ?', 1.week.ago) if recent_only
articles = articles.joins(:author).where(authors: { verified: true })
articles = articles.includes(:comments, :tags)  # Eager loading

# Executes single optimized query on enumeration
articles.each do |article|
  puts "#{article.title}: #{article.comments.size} comments"
end

# Raw SQL for complex queries
Article.find_by_sql([
  "SELECT articles.*, COUNT(comments.id) as comment_count
   FROM articles
   LEFT JOIN comments ON comments.article_id = articles.id
   WHERE articles.published = true
   GROUP BY articles.id
   HAVING COUNT(comments.id) > ?
   ORDER BY comment_count DESC",
  10
])

Transactions: ActiveRecord wraps database transactions, rolling back on exceptions and committing on successful completion. Nested transactions use database savepoints. Transaction isolation levels can be specified per transaction.

# Basic transaction
Article.transaction do
  article = Article.create!(title: 'New Article', author: current_user)
  article.tags << Tag.find_or_create_by!(name: 'ruby')
  NotificationJob.perform_later(article.id)
end

# Handling specific exceptions
begin
  User.transaction do
    user.update!(credits: user.credits - 100)
    Purchase.create!(user: user, amount: 100)
  end
rescue ActiveRecord::RecordInvalid => e
  logger.error("Purchase failed: #{e.message}")
  nil
end

# Nested transactions with savepoints
Article.transaction do
  article = Article.create!(title: 'Parent Article')
  
  Article.transaction(requires_new: true) do
    Comment.create!(article: article, body: 'First comment')
  end  # Savepoint commits here
  
  article.publish!
end  # Full transaction commits

Migrations: Database schema changes are managed through migrations, versioned Ruby files that define transformations. Migrations run in order, tracking which have executed. Each migration defines up (forward) and down (rollback) directions.

class AddIndexesToArticles < ActiveRecord::Migration[7.0]
  def change
    add_index :articles, :author_id
    add_index :articles, [:published, :created_at]
    add_index :articles, :title, where: "published = true"
    
    # Add check constraint
    add_check_constraint :articles,
      "published_at IS NULL OR published = true",
      name: "published_at_requires_published"
  end
end

# Data migration with safety checks
class BackfillUserRoles < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!
  
  def up
    User.where(role: nil).find_each do |user|
      user.update_column(:role, 'member')
    end
  end
  
  def down
    # Intentionally blank - cannot reverse data changes
  end
end

Design Considerations

Schema Design Strategy: Schema design balances normalization for data integrity against denormalization for query performance. Highly normalized schemas minimize redundancy and maintain consistency but require more joins. Denormalized schemas reduce joins by duplicating data but risk inconsistencies and complicate updates.

Domain-driven design influences schema structure. Aggregate roots correspond to primary entities with their own tables. Value objects may embed in parent tables or separate based on cardinality. Entities with complex lifecycles often warrant separate tables even when conceptually embedded.

# Normalized approach - separate address table
class User < ApplicationRecord
  has_one :address, dependent: :destroy
end

class Address < ApplicationRecord
  belongs_to :user
end

# Denormalized approach - embedded in user table
class CreateUsers < ActiveRecord::Migration[7.0]
  def change
    create_table :users do |t|
      t.string :email
      t.string :street_address
      t.string :city
      t.string :postal_code
      t.timestamps
    end
  end
end

Normalization Trade-offs: Third normal form (3NF) provides a practical balance for most applications. Higher normalization levels reduce redundancy further but add join complexity. Denormalization makes sense when read frequency vastly exceeds write frequency and the denormalized data rarely changes.

Materialized views and summary tables implement controlled denormalization while separating source-of-truth data from optimized query structures. Background jobs maintain consistency between normalized source tables and denormalized read models.

Relationship Cardinality: One-to-one relationships typically keep foreign keys in the child table or combine into a single table. One-to-many relationships place foreign keys in the many side. Many-to-many relationships require junction tables storing pairs of foreign keys.

# One-to-many: orders belong to users
class CreateOrders < ActiveRecord::Migration[7.0]
  def change
    create_table :orders do |t|
      t.references :user, null: false, foreign_key: true
      t.decimal :total
      t.timestamps
    end
  end
end

# Many-to-many: articles have many tags through article_tags
class CreateArticleTags < ActiveRecord::Migration[7.0]
  def change
    create_table :article_tags do |t|
      t.references :article, null: false, foreign_key: true
      t.references :tag, null: false, foreign_key: true
      t.timestamps
    end
    
    add_index :article_tags, [:article_id, :tag_id], unique: true
  end
end

Soft Deletes vs Hard Deletes: Soft deletes preserve historical records by marking rows as deleted rather than removing them. This supports audit trails, data recovery, and maintaining foreign key integrity. Hard deletes permanently remove data, simplifying queries and reducing storage but losing history.

Soft deletes complicate queries, requiring consistent filtering of deleted records. Default scopes can hide deleted records automatically but obscure query behavior. Explicit scopes provide clarity at the cost of repetition.

# Soft delete implementation
class Article < ApplicationRecord
  scope :active, -> { where(deleted_at: nil) }
  
  def soft_delete!
    update!(deleted_at: Time.current)
  end
  
  def restore!
    update!(deleted_at: nil)
  end
end

# Querying with soft deletes
Article.active.where(published: true)

Database Selection: PostgreSQL provides advanced features like JSON columns, full-text search, and custom data types. MySQL offers simpler administration and wide hosting availability. SQLite suits development environments and small-scale deployments. Database choice affects available features, query syntax, and operational characteristics.

Common Patterns

Repository Pattern: The repository pattern abstracts data access behind an interface, isolating business logic from database implementation details. Repositories encapsulate query logic, providing methods for common data operations. This pattern facilitates testing by allowing mock repositories and enables switching database implementations.

class ArticleRepository
  def find_published(limit: 10)
    Article.published.includes(:author, :tags).limit(limit)
  end
  
  def find_by_author_with_stats(author_id)
    Article.where(author_id: author_id)
      .select('articles.*, COUNT(comments.id) as comments_count')
      .left_joins(:comments)
      .group('articles.id')
  end
  
  def create_with_tags(attributes, tag_names)
    Article.transaction do
      article = Article.create!(attributes)
      tag_names.each do |name|
        tag = Tag.find_or_create_by!(name: name)
        article.tags << tag
      end
      article
    end
  end
end

Query Objects: Complex queries extract into dedicated query objects that encapsulate filtering, joining, and ordering logic. Query objects compose with ActiveRecord relations, accepting scopes as input and returning refined relations. This pattern organizes complex query logic and enables reuse.

class PopularArticlesQuery
  def initialize(relation = Article.all)
    @relation = relation
  end
  
  def call(since: 1.month.ago, min_views: 1000)
    @relation
      .published
      .where('created_at > ?', since)
      .where('view_count >= ?', min_views)
      .includes(:author)
      .order(view_count: :desc)
  end
end

# Usage
PopularArticlesQuery.new.call(since: 1.week.ago)
PopularArticlesQuery.new(user.articles).call(min_views: 500)

Polymorphic Associations: Polymorphic associations allow a model to belong to multiple other models through a single association. The polymorphic foreign key stores both the ID and type of the parent record. This pattern reduces table proliferation when multiple models share similar relationships.

# Comments can belong to articles or videos
class CreateComments < ActiveRecord::Migration[7.0]
  def change
    create_table :comments do |t|
      t.references :commentable, polymorphic: true, null: false
      t.text :body
      t.timestamps
    end
  end
end

class Comment < ApplicationRecord
  belongs_to :commentable, polymorphic: true
end

class Article < ApplicationRecord
  has_many :comments, as: :commentable
end

class Video < ApplicationRecord
  has_many :comments, as: :commentable
end

Single Table Inheritance: Single table inheritance (STI) stores multiple related models in one table, using a type column to distinguish between them. STI suits scenarios where models share most attributes and behavior but differ in specific methods. All subclass columns must permit NULL values since not all rows use them.

class CreateDocuments < ActiveRecord::Migration[7.0]
  def change
    create_table :documents do |t|
      t.string :type, null: false
      t.string :title
      t.text :content
      t.string :url  # Only for Link
      t.integer :page_count  # Only for PDF
      t.timestamps
    end
    
    add_index :documents, :type
  end
end

class Document < ApplicationRecord
  # Shared behavior
end

class Article < Document
  validates :content, presence: true
end

class Link < Document
  validates :url, presence: true, format: { with: URI.regexp }
end

Auditing Pattern: Audit trails track changes to records, storing who modified what and when. Audit implementations range from simple timestamp columns to complete version history tables. The PaperTrail gem provides comprehensive versioning with association tracking.

# Simple auditing with columns
class AddAuditFieldsToArticles < ActiveRecord::Migration[7.0]
  def change
    add_column :articles, :created_by_id, :bigint
    add_column :articles, :updated_by_id, :bigint
    add_foreign_key :articles, :users, column: :created_by_id
    add_foreign_key :articles, :users, column: :updated_by_id
  end
end

class Article < ApplicationRecord
  belongs_to :created_by, class_name: 'User'
  belongs_to :updated_by, class_name: 'User'
  
  before_create :set_created_by
  before_update :set_updated_by
  
  private
  
  def set_created_by
    self.created_by = Current.user
  end
  
  def set_updated_by
    self.updated_by = Current.user
  end
end

Performance Considerations

Index Strategy: Indexes accelerate queries by maintaining sorted data structures for quick lookups. Single-column indexes optimize queries filtering on that column. Composite indexes optimize queries filtering multiple columns, with column order determining effectiveness. Leading columns in composite indexes must appear in WHERE clauses for index use.

Indexes impose overhead on writes, as each insert, update, or delete must update all relevant indexes. Over-indexing slows writes without proportional read improvements. Index maintenance includes periodic analysis to identify unused indexes.

# Strategic index placement
class OptimizeArticleIndexes < ActiveRecord::Migration[7.0]
  def change
    # Single column index for common queries
    add_index :articles, :published_at
    
    # Composite index for multi-column queries
    # Effective for: WHERE author_id = ? AND published = true
    # Also effective for: WHERE author_id = ? (uses prefix)
    add_index :articles, [:author_id, :published]
    
    # Partial index for subset queries
    add_index :articles, :title, where: "published = true"
    
    # Expression index for computed queries
    add_index :articles, "LOWER(title)"
  end
end

Query Optimization: The N+1 query problem occurs when code loads a collection then accesses associations for each item, generating one query per item. Eager loading with includes preloads associations in fewer queries. Use includes for associations accessed for all records. Use joins when filtering on associations without needing association data.

# N+1 queries - loads each author separately
articles = Article.published
articles.each do |article|
  puts article.author.name  # SELECT * FROM users WHERE id = ?
end

# Solution: eager loading
articles = Article.published.includes(:author)
articles.each do |article|
  puts article.author.name  # No additional query
end

# Join without loading association data
Article.joins(:author)
  .where(authors: { verified: true })
  .select('articles.*')

# Preload multiple associations with different strategies
Article.includes(:author)
  .preload(:tags)
  .eager_load(:comments)

Batch Processing: Loading large datasets into memory causes excessive memory usage. Batch processing loads records in chunks, processing each batch before loading the next. The find_each method loads records in batches of 1000 by default. The in_batches method yields batches as relations for bulk operations.

# Process records in batches
User.find_each(batch_size: 500) do |user|
  UserMailer.newsletter(user).deliver_later
end

# Batch updates
Article.where(published: false).in_batches(of: 1000) do |batch|
  batch.update_all(archived: true)
end

# Custom batch processing
User.where('last_login_at < ?', 1.year.ago).find_in_batches do |batch|
  ids = batch.map(&:id)
  DeleteInactiveUserJob.perform_later(ids)
end

Connection Pooling: Database connections are expensive resources. Connection pools maintain a fixed number of open connections, allocating them to threads on demand. Pool size should match the maximum concurrent database operations. Too small causes thread contention; too large wastes resources and may exceed database connection limits.

# Configure connection pool
ActiveRecord::Base.establish_connection(
  adapter: 'postgresql',
  pool: 25,
  checkout_timeout: 5
)

# Monitor pool usage
pool = ActiveRecord::Base.connection_pool
puts "Size: #{pool.size}"
puts "Active: #{pool.connections.count(&:in_use?)}"
puts "Available: #{pool.available_connection_count}"

Database Caching: Query caching caches query results within a single request, returning identical results for duplicate queries without hitting the database. Fragment caching stores rendered view components, skipping database queries and view rendering. Counter caching maintains aggregate counts in parent tables to avoid expensive COUNT queries.

# Counter cache for association counts
class CreateComments < ActiveRecord::Migration[7.0]
  def change
    add_column :articles, :comments_count, :integer, default: 0, null: false
  end
end

class Comment < ApplicationRecord
  belongs_to :article, counter_cache: true
end

# Access cached count without query
article.comments_count  # No SELECT COUNT(*)

# Fragment caching with cache key
class Article < ApplicationRecord
  def cache_key_with_version
    "#{cache_key}/#{cache_version}"
  end
end

Tools & Ecosystem

ActiveRecord: The default Rails ORM provides convention-over-configuration mapping between models and tables. ActiveRecord includes query interface, associations, validations, callbacks, and migrations. The framework handles common database operations while allowing raw SQL when needed.

Sequel: An alternative ORM emphasizing flexibility and SQL generation. Sequel provides dataset chaining for complex queries, plugin architecture for extensions, and database-specific features. Sequel suits applications requiring advanced SQL or multiple database connections.

# Sequel usage
require 'sequel'

DB = Sequel.connect('postgres://localhost/mydb')

class Article < Sequel::Model
  many_to_one :author, class: :User
  one_to_many :comments
  
  dataset_module do
    def published
      where(published: true)
    end
    
    def with_stats
      select_append {
        Sequel.function(:count, :comments__id).as(:comment_count)
      }
      .left_join(:comments, article_id: :id)
      .group(:articles__id)
    end
  end
end

# Query with dataset methods
Article.published.with_stats.where { view_count > 1000 }.all

Database Adapters: Ruby database adapters implement the interface between ORMs and specific database systems. The pg gem connects to PostgreSQL. The mysql2 gem connects to MySQL. The sqlite3 gem connects to SQLite. Adapters handle connection management, query execution, and result parsing.

Migration Tools: Schema management extends beyond ActiveRecord migrations. The standalone_migrations gem brings Rails migrations to non-Rails projects. The Ridgepole gem manages schema through a declarative Ruby DSL, enabling schema version control and comparison.

Query Analysis Tools: The bullet gem detects N+1 queries and missing eager loading during development. The rack-mini-profiler shows SQL queries and execution times for each request. Database-specific tools like PostgreSQL's EXPLAIN ANALYZE reveal query execution plans.

# Bullet configuration
Bullet.enable = true
Bullet.alert = true
Bullet.bullet_logger = true
Bullet.rails_logger = true

# Query analysis output
# Bullet warning in logs:
# N+1 Query detected
# Article => [:author]
# Add to your query: .includes(:author)

Schema Visualization: The rails-erd gem generates entity-relationship diagrams from ActiveRecord models. SchemaSpy analyzes database schemas and produces detailed HTML documentation. These tools aid understanding complex schemas and identifying design issues.

Database GUIs: Postico provides a native PostgreSQL client for macOS. TablePlus supports multiple databases with a consistent interface. pgAdmin offers comprehensive PostgreSQL management. These tools complement command-line interfaces for schema exploration and query development.

Reference

SQL Data Types

Type Description Ruby Mapping
INTEGER Whole numbers Integer
BIGINT Large whole numbers Integer
DECIMAL Fixed-precision decimals BigDecimal
FLOAT Floating-point numbers Float
VARCHAR Variable-length strings String
TEXT Unlimited text String
BOOLEAN True/false values TrueClass/FalseClass
DATE Calendar dates Date
TIMESTAMP Date with time Time/DateTime
JSON JSON documents Hash/Array
BYTEA Binary data Binary string

Association Types

Association Usage Foreign Key Location
belongs_to Child references parent Child table
has_one Parent references single child Child table
has_many Parent references multiple children Child table
has_many :through Many-to-many via join model Join table
has_and_belongs_to_many Many-to-many direct Join table
has_one :through One-to-one via join model Join table

Common Query Methods

Method Purpose Example
where Filter records where(published: true)
joins Inner join tables joins(:author)
includes Eager load associations includes(:comments)
preload Separate query eager load preload(:tags)
eager_load LEFT JOIN eager load eager_load(:author)
select Specify columns select(:id, :title)
order Sort results order(created_at: :desc)
limit Limit results limit(10)
offset Skip records offset(20)
group Group aggregates group(:author_id)
having Filter groups having('COUNT(*) > 5')
distinct Remove duplicates distinct
find_by Find single record find_by(email: 'user@example.com')
pluck Extract column values pluck(:id, :title)
exists? Check record existence exists?(id: 123)

Transaction Isolation Levels

Level Description Trade-offs
READ UNCOMMITTED Allows dirty reads Maximum concurrency, minimal consistency
READ COMMITTED Prevents dirty reads Default for most databases
REPEATABLE READ Prevents non-repeatable reads Higher consistency, reduced concurrency
SERIALIZABLE Complete isolation Maximum consistency, lowest concurrency

Index Types

Type Best For Limitations
B-tree Range queries, sorting Default choice for most columns
Hash Exact equality matches No range queries or sorting
GiST Full-text, geometric data Database-specific implementation
GIN Array, JSON searches PostgreSQL-specific
Partial Subset of rows Requires WHERE clause match
Unique Enforcing uniqueness Additional constraint validation
Composite Multi-column queries Column order matters

Migration Commands

Command Purpose Example
create_table Create new table create_table :users
drop_table Remove table drop_table :users
add_column Add column add_column :users, :name, :string
remove_column Delete column remove_column :users, :age
rename_column Rename column rename_column :users, :name, :full_name
change_column Modify column change_column :users, :age, :bigint
add_index Create index add_index :users, :email, unique: true
remove_index Delete index remove_index :users, :email
add_foreign_key Add FK constraint add_foreign_key :orders, :users
add_reference Add FK column and index add_reference :orders, :user

ACID Transaction Properties

Property Guarantee Implementation
Atomicity All or nothing execution Transaction rollback on failure
Consistency Valid state transitions Constraint enforcement
Isolation Concurrent transaction separation Locking mechanisms
Durability Persistent changes Write-ahead logging

Normalization Forms

Form Requirement Eliminates
1NF Atomic values, no repeating groups Duplicate columns
2NF 1NF plus no partial dependencies Partial key dependencies
3NF 2NF plus no transitive dependencies Transitive dependencies
BCNF 3NF plus determinant is superkey Certain 3NF anomalies
4NF BCNF plus no multi-valued dependencies Independent multi-valued facts

Common Validation Constraints

Constraint Purpose Example
NOT NULL Require value null: false
UNIQUE Prevent duplicates unique: true
CHECK Custom conditions check_constraint 'age > 0'
FOREIGN KEY Referential integrity foreign_key: true
PRIMARY KEY Unique identifier primary_key: true
DEFAULT Automatic value default: 0