Overview
Relational databases organize data into tables (relations) consisting of rows (tuples) and columns (attributes). Each table represents an entity type, and relationships between entities are established through foreign keys that reference primary keys in other tables. The relational model, introduced by Edgar F. Codd in 1970, provides a mathematical foundation for data organization and manipulation.
The relational model separates logical data structure from physical storage, allowing databases to optimize storage and retrieval independently of application code. This abstraction enables multiple applications to access the same data through a standardized query language (SQL) while the database management system (DBMS) handles concurrency, transactions, and data integrity.
Relational databases enforce schemas that define table structures, data types, constraints, and relationships before data insertion. This schema-first approach contrasts with schema-less NoSQL databases. The schema serves as a contract between the database and applications, preventing invalid data states and maintaining referential integrity.
# Conceptual table structure in Ruby/ActiveRecord
class User < ApplicationRecord
has_many :orders
validates :email, presence: true, uniqueness: true
end
class Order < ApplicationRecord
belongs_to :user
has_many :order_items
end
The relational model provides declarative querying through SQL, where queries specify what data to retrieve rather than how to retrieve it. The DBMS query optimizer determines the most efficient execution plan based on available indexes, table statistics, and join strategies.
Key Principles
Tables and Relations: A table represents a relation, consisting of a heading (column definitions) and a body (rows of data). Each column has a defined data type and constraints. A row must have a value for every column, though that value may be NULL if permitted. Tables must have at least one column that uniquely identifies each row, called a primary key.
# Table definition using ActiveRecord migration
class CreateUsers < ActiveRecord::Migration[7.0]
def change
create_table :users do |t|
t.string :email, null: false
t.string :encrypted_password, null: false
t.datetime :last_login_at
t.timestamps
end
add_index :users, :email, unique: true
end
end
Primary Keys: A primary key uniquely identifies each row in a table. Primary keys cannot contain NULL values and must be unique across all rows. Composite primary keys use multiple columns to form uniqueness. The choice between natural keys (business data) and surrogate keys (generated identifiers) affects schema design and query patterns.
Foreign Keys: Foreign keys establish relationships between tables by referencing another table's primary key. The database enforces referential integrity, preventing orphaned records and maintaining consistency across related tables. Foreign key constraints specify behavior for cascading updates and deletes.
# Foreign key relationship with referential integrity
class CreateOrders < ActiveRecord::Migration[7.0]
def change
create_table :orders do |t|
t.references :user, null: false, foreign_key: true
t.decimal :total, precision: 10, scale: 2
t.string :status, default: 'pending'
t.timestamps
end
end
end
ACID Properties: Relational databases guarantee ACID properties for transactions. Atomicity ensures all operations within a transaction complete or none do. Consistency maintains database constraints before and after transactions. Isolation prevents concurrent transactions from interfering with each other. Durability guarantees committed transactions persist even after system failures.
Normalization: Database normalization organizes tables to reduce redundancy and dependency issues. First normal form (1NF) requires atomic values in each column. Second normal form (2NF) eliminates partial dependencies on composite keys. Third normal form (3NF) removes transitive dependencies. Boyce-Codd normal form (BCNF) handles specific anomalies in 3NF. Higher normal forms address multi-valued and join dependencies.
Joins: Queries combine data from multiple tables through joins. Inner joins return rows with matching values in both tables. Left outer joins return all rows from the left table plus matching rows from the right. Right outer joins do the opposite. Full outer joins return all rows from both tables. Cross joins produce the Cartesian product of both tables.
Indexes: Indexes accelerate data retrieval by maintaining sorted data structures that point to table rows. B-tree indexes support range queries and sorting. Hash indexes optimize exact-match lookups. Composite indexes cover queries filtering multiple columns. Covering indexes include all columns needed by a query, avoiding table access entirely.
Ruby Implementation
Ruby applications interact with relational databases primarily through Object-Relational Mapping (ORM) libraries that translate between database tables and Ruby objects. ActiveRecord, the ORM included with Rails, provides the most widely-used implementation. Sequel offers an alternative with different design philosophy and features.
ActiveRecord Connections: ActiveRecord manages database connections through connection pools. Each thread acquires a connection from the pool when executing queries and releases it after completion. Configuration specifies database adapter, connection parameters, and pool size.
# Database configuration in database.yml
default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
timeout: 5000
development:
<<: *default
database: app_development
# Establishing connection in code
ActiveRecord::Base.establish_connection(
adapter: 'postgresql',
host: 'localhost',
database: 'production_db',
username: 'app_user',
password: ENV['DB_PASSWORD'],
pool: 25,
checkout_timeout: 5
)
Model Definitions: Models inherit from ActiveRecord::Base and map to database tables by convention. Table names default to the pluralized, underscored model name. Models define associations, validations, and business logic encapsulating database operations.
class Article < ApplicationRecord
belongs_to :author, class_name: 'User'
has_many :comments, dependent: :destroy
has_many :tags, through: :article_tags
validates :title, presence: true, length: { minimum: 5 }
validates :author, presence: true
scope :published, -> { where(published: true) }
scope :recent, -> { order(created_at: :desc).limit(10) }
def publish!
update!(published: true, published_at: Time.current)
end
end
Query Interface: ActiveRecord provides a fluent query interface that generates SQL. Queries return ActiveRecord::Relation objects that compose lazily until enumeration. This allows building complex queries incrementally while the ORM optimizes generated SQL.
# Building queries incrementally
articles = Article.published
articles = articles.where('created_at > ?', 1.week.ago) if recent_only
articles = articles.joins(:author).where(authors: { verified: true })
articles = articles.includes(:comments, :tags) # Eager loading
# Executes single optimized query on enumeration
articles.each do |article|
puts "#{article.title}: #{article.comments.size} comments"
end
# Raw SQL for complex queries
Article.find_by_sql([
"SELECT articles.*, COUNT(comments.id) as comment_count
FROM articles
LEFT JOIN comments ON comments.article_id = articles.id
WHERE articles.published = true
GROUP BY articles.id
HAVING COUNT(comments.id) > ?
ORDER BY comment_count DESC",
10
])
Transactions: ActiveRecord wraps database transactions, rolling back on exceptions and committing on successful completion. Nested transactions use database savepoints. Transaction isolation levels can be specified per transaction.
# Basic transaction
Article.transaction do
article = Article.create!(title: 'New Article', author: current_user)
article.tags << Tag.find_or_create_by!(name: 'ruby')
NotificationJob.perform_later(article.id)
end
# Handling specific exceptions
begin
User.transaction do
user.update!(credits: user.credits - 100)
Purchase.create!(user: user, amount: 100)
end
rescue ActiveRecord::RecordInvalid => e
logger.error("Purchase failed: #{e.message}")
nil
end
# Nested transactions with savepoints
Article.transaction do
article = Article.create!(title: 'Parent Article')
Article.transaction(requires_new: true) do
Comment.create!(article: article, body: 'First comment')
end # Savepoint commits here
article.publish!
end # Full transaction commits
Migrations: Database schema changes are managed through migrations, versioned Ruby files that define transformations. Migrations run in order, tracking which have executed. Each migration defines up (forward) and down (rollback) directions.
class AddIndexesToArticles < ActiveRecord::Migration[7.0]
def change
add_index :articles, :author_id
add_index :articles, [:published, :created_at]
add_index :articles, :title, where: "published = true"
# Add check constraint
add_check_constraint :articles,
"published_at IS NULL OR published = true",
name: "published_at_requires_published"
end
end
# Data migration with safety checks
class BackfillUserRoles < ActiveRecord::Migration[7.0]
disable_ddl_transaction!
def up
User.where(role: nil).find_each do |user|
user.update_column(:role, 'member')
end
end
def down
# Intentionally blank - cannot reverse data changes
end
end
Design Considerations
Schema Design Strategy: Schema design balances normalization for data integrity against denormalization for query performance. Highly normalized schemas minimize redundancy and maintain consistency but require more joins. Denormalized schemas reduce joins by duplicating data but risk inconsistencies and complicate updates.
Domain-driven design influences schema structure. Aggregate roots correspond to primary entities with their own tables. Value objects may embed in parent tables or separate based on cardinality. Entities with complex lifecycles often warrant separate tables even when conceptually embedded.
# Normalized approach - separate address table
class User < ApplicationRecord
has_one :address, dependent: :destroy
end
class Address < ApplicationRecord
belongs_to :user
end
# Denormalized approach - embedded in user table
class CreateUsers < ActiveRecord::Migration[7.0]
def change
create_table :users do |t|
t.string :email
t.string :street_address
t.string :city
t.string :postal_code
t.timestamps
end
end
end
Normalization Trade-offs: Third normal form (3NF) provides a practical balance for most applications. Higher normalization levels reduce redundancy further but add join complexity. Denormalization makes sense when read frequency vastly exceeds write frequency and the denormalized data rarely changes.
Materialized views and summary tables implement controlled denormalization while separating source-of-truth data from optimized query structures. Background jobs maintain consistency between normalized source tables and denormalized read models.
Relationship Cardinality: One-to-one relationships typically keep foreign keys in the child table or combine into a single table. One-to-many relationships place foreign keys in the many side. Many-to-many relationships require junction tables storing pairs of foreign keys.
# One-to-many: orders belong to users
class CreateOrders < ActiveRecord::Migration[7.0]
def change
create_table :orders do |t|
t.references :user, null: false, foreign_key: true
t.decimal :total
t.timestamps
end
end
end
# Many-to-many: articles have many tags through article_tags
class CreateArticleTags < ActiveRecord::Migration[7.0]
def change
create_table :article_tags do |t|
t.references :article, null: false, foreign_key: true
t.references :tag, null: false, foreign_key: true
t.timestamps
end
add_index :article_tags, [:article_id, :tag_id], unique: true
end
end
Soft Deletes vs Hard Deletes: Soft deletes preserve historical records by marking rows as deleted rather than removing them. This supports audit trails, data recovery, and maintaining foreign key integrity. Hard deletes permanently remove data, simplifying queries and reducing storage but losing history.
Soft deletes complicate queries, requiring consistent filtering of deleted records. Default scopes can hide deleted records automatically but obscure query behavior. Explicit scopes provide clarity at the cost of repetition.
# Soft delete implementation
class Article < ApplicationRecord
scope :active, -> { where(deleted_at: nil) }
def soft_delete!
update!(deleted_at: Time.current)
end
def restore!
update!(deleted_at: nil)
end
end
# Querying with soft deletes
Article.active.where(published: true)
Database Selection: PostgreSQL provides advanced features like JSON columns, full-text search, and custom data types. MySQL offers simpler administration and wide hosting availability. SQLite suits development environments and small-scale deployments. Database choice affects available features, query syntax, and operational characteristics.
Common Patterns
Repository Pattern: The repository pattern abstracts data access behind an interface, isolating business logic from database implementation details. Repositories encapsulate query logic, providing methods for common data operations. This pattern facilitates testing by allowing mock repositories and enables switching database implementations.
class ArticleRepository
def find_published(limit: 10)
Article.published.includes(:author, :tags).limit(limit)
end
def find_by_author_with_stats(author_id)
Article.where(author_id: author_id)
.select('articles.*, COUNT(comments.id) as comments_count')
.left_joins(:comments)
.group('articles.id')
end
def create_with_tags(attributes, tag_names)
Article.transaction do
article = Article.create!(attributes)
tag_names.each do |name|
tag = Tag.find_or_create_by!(name: name)
article.tags << tag
end
article
end
end
end
Query Objects: Complex queries extract into dedicated query objects that encapsulate filtering, joining, and ordering logic. Query objects compose with ActiveRecord relations, accepting scopes as input and returning refined relations. This pattern organizes complex query logic and enables reuse.
class PopularArticlesQuery
def initialize(relation = Article.all)
@relation = relation
end
def call(since: 1.month.ago, min_views: 1000)
@relation
.published
.where('created_at > ?', since)
.where('view_count >= ?', min_views)
.includes(:author)
.order(view_count: :desc)
end
end
# Usage
PopularArticlesQuery.new.call(since: 1.week.ago)
PopularArticlesQuery.new(user.articles).call(min_views: 500)
Polymorphic Associations: Polymorphic associations allow a model to belong to multiple other models through a single association. The polymorphic foreign key stores both the ID and type of the parent record. This pattern reduces table proliferation when multiple models share similar relationships.
# Comments can belong to articles or videos
class CreateComments < ActiveRecord::Migration[7.0]
def change
create_table :comments do |t|
t.references :commentable, polymorphic: true, null: false
t.text :body
t.timestamps
end
end
end
class Comment < ApplicationRecord
belongs_to :commentable, polymorphic: true
end
class Article < ApplicationRecord
has_many :comments, as: :commentable
end
class Video < ApplicationRecord
has_many :comments, as: :commentable
end
Single Table Inheritance: Single table inheritance (STI) stores multiple related models in one table, using a type column to distinguish between them. STI suits scenarios where models share most attributes and behavior but differ in specific methods. All subclass columns must permit NULL values since not all rows use them.
class CreateDocuments < ActiveRecord::Migration[7.0]
def change
create_table :documents do |t|
t.string :type, null: false
t.string :title
t.text :content
t.string :url # Only for Link
t.integer :page_count # Only for PDF
t.timestamps
end
add_index :documents, :type
end
end
class Document < ApplicationRecord
# Shared behavior
end
class Article < Document
validates :content, presence: true
end
class Link < Document
validates :url, presence: true, format: { with: URI.regexp }
end
Auditing Pattern: Audit trails track changes to records, storing who modified what and when. Audit implementations range from simple timestamp columns to complete version history tables. The PaperTrail gem provides comprehensive versioning with association tracking.
# Simple auditing with columns
class AddAuditFieldsToArticles < ActiveRecord::Migration[7.0]
def change
add_column :articles, :created_by_id, :bigint
add_column :articles, :updated_by_id, :bigint
add_foreign_key :articles, :users, column: :created_by_id
add_foreign_key :articles, :users, column: :updated_by_id
end
end
class Article < ApplicationRecord
belongs_to :created_by, class_name: 'User'
belongs_to :updated_by, class_name: 'User'
before_create :set_created_by
before_update :set_updated_by
private
def set_created_by
self.created_by = Current.user
end
def set_updated_by
self.updated_by = Current.user
end
end
Performance Considerations
Index Strategy: Indexes accelerate queries by maintaining sorted data structures for quick lookups. Single-column indexes optimize queries filtering on that column. Composite indexes optimize queries filtering multiple columns, with column order determining effectiveness. Leading columns in composite indexes must appear in WHERE clauses for index use.
Indexes impose overhead on writes, as each insert, update, or delete must update all relevant indexes. Over-indexing slows writes without proportional read improvements. Index maintenance includes periodic analysis to identify unused indexes.
# Strategic index placement
class OptimizeArticleIndexes < ActiveRecord::Migration[7.0]
def change
# Single column index for common queries
add_index :articles, :published_at
# Composite index for multi-column queries
# Effective for: WHERE author_id = ? AND published = true
# Also effective for: WHERE author_id = ? (uses prefix)
add_index :articles, [:author_id, :published]
# Partial index for subset queries
add_index :articles, :title, where: "published = true"
# Expression index for computed queries
add_index :articles, "LOWER(title)"
end
end
Query Optimization: The N+1 query problem occurs when code loads a collection then accesses associations for each item, generating one query per item. Eager loading with includes preloads associations in fewer queries. Use includes for associations accessed for all records. Use joins when filtering on associations without needing association data.
# N+1 queries - loads each author separately
articles = Article.published
articles.each do |article|
puts article.author.name # SELECT * FROM users WHERE id = ?
end
# Solution: eager loading
articles = Article.published.includes(:author)
articles.each do |article|
puts article.author.name # No additional query
end
# Join without loading association data
Article.joins(:author)
.where(authors: { verified: true })
.select('articles.*')
# Preload multiple associations with different strategies
Article.includes(:author)
.preload(:tags)
.eager_load(:comments)
Batch Processing: Loading large datasets into memory causes excessive memory usage. Batch processing loads records in chunks, processing each batch before loading the next. The find_each method loads records in batches of 1000 by default. The in_batches method yields batches as relations for bulk operations.
# Process records in batches
User.find_each(batch_size: 500) do |user|
UserMailer.newsletter(user).deliver_later
end
# Batch updates
Article.where(published: false).in_batches(of: 1000) do |batch|
batch.update_all(archived: true)
end
# Custom batch processing
User.where('last_login_at < ?', 1.year.ago).find_in_batches do |batch|
ids = batch.map(&:id)
DeleteInactiveUserJob.perform_later(ids)
end
Connection Pooling: Database connections are expensive resources. Connection pools maintain a fixed number of open connections, allocating them to threads on demand. Pool size should match the maximum concurrent database operations. Too small causes thread contention; too large wastes resources and may exceed database connection limits.
# Configure connection pool
ActiveRecord::Base.establish_connection(
adapter: 'postgresql',
pool: 25,
checkout_timeout: 5
)
# Monitor pool usage
pool = ActiveRecord::Base.connection_pool
puts "Size: #{pool.size}"
puts "Active: #{pool.connections.count(&:in_use?)}"
puts "Available: #{pool.available_connection_count}"
Database Caching: Query caching caches query results within a single request, returning identical results for duplicate queries without hitting the database. Fragment caching stores rendered view components, skipping database queries and view rendering. Counter caching maintains aggregate counts in parent tables to avoid expensive COUNT queries.
# Counter cache for association counts
class CreateComments < ActiveRecord::Migration[7.0]
def change
add_column :articles, :comments_count, :integer, default: 0, null: false
end
end
class Comment < ApplicationRecord
belongs_to :article, counter_cache: true
end
# Access cached count without query
article.comments_count # No SELECT COUNT(*)
# Fragment caching with cache key
class Article < ApplicationRecord
def cache_key_with_version
"#{cache_key}/#{cache_version}"
end
end
Tools & Ecosystem
ActiveRecord: The default Rails ORM provides convention-over-configuration mapping between models and tables. ActiveRecord includes query interface, associations, validations, callbacks, and migrations. The framework handles common database operations while allowing raw SQL when needed.
Sequel: An alternative ORM emphasizing flexibility and SQL generation. Sequel provides dataset chaining for complex queries, plugin architecture for extensions, and database-specific features. Sequel suits applications requiring advanced SQL or multiple database connections.
# Sequel usage
require 'sequel'
DB = Sequel.connect('postgres://localhost/mydb')
class Article < Sequel::Model
many_to_one :author, class: :User
one_to_many :comments
dataset_module do
def published
where(published: true)
end
def with_stats
select_append {
Sequel.function(:count, :comments__id).as(:comment_count)
}
.left_join(:comments, article_id: :id)
.group(:articles__id)
end
end
end
# Query with dataset methods
Article.published.with_stats.where { view_count > 1000 }.all
Database Adapters: Ruby database adapters implement the interface between ORMs and specific database systems. The pg gem connects to PostgreSQL. The mysql2 gem connects to MySQL. The sqlite3 gem connects to SQLite. Adapters handle connection management, query execution, and result parsing.
Migration Tools: Schema management extends beyond ActiveRecord migrations. The standalone_migrations gem brings Rails migrations to non-Rails projects. The Ridgepole gem manages schema through a declarative Ruby DSL, enabling schema version control and comparison.
Query Analysis Tools: The bullet gem detects N+1 queries and missing eager loading during development. The rack-mini-profiler shows SQL queries and execution times for each request. Database-specific tools like PostgreSQL's EXPLAIN ANALYZE reveal query execution plans.
# Bullet configuration
Bullet.enable = true
Bullet.alert = true
Bullet.bullet_logger = true
Bullet.rails_logger = true
# Query analysis output
# Bullet warning in logs:
# N+1 Query detected
# Article => [:author]
# Add to your query: .includes(:author)
Schema Visualization: The rails-erd gem generates entity-relationship diagrams from ActiveRecord models. SchemaSpy analyzes database schemas and produces detailed HTML documentation. These tools aid understanding complex schemas and identifying design issues.
Database GUIs: Postico provides a native PostgreSQL client for macOS. TablePlus supports multiple databases with a consistent interface. pgAdmin offers comprehensive PostgreSQL management. These tools complement command-line interfaces for schema exploration and query development.
Reference
SQL Data Types
| Type | Description | Ruby Mapping |
|---|---|---|
| INTEGER | Whole numbers | Integer |
| BIGINT | Large whole numbers | Integer |
| DECIMAL | Fixed-precision decimals | BigDecimal |
| FLOAT | Floating-point numbers | Float |
| VARCHAR | Variable-length strings | String |
| TEXT | Unlimited text | String |
| BOOLEAN | True/false values | TrueClass/FalseClass |
| DATE | Calendar dates | Date |
| TIMESTAMP | Date with time | Time/DateTime |
| JSON | JSON documents | Hash/Array |
| BYTEA | Binary data | Binary string |
Association Types
| Association | Usage | Foreign Key Location |
|---|---|---|
| belongs_to | Child references parent | Child table |
| has_one | Parent references single child | Child table |
| has_many | Parent references multiple children | Child table |
| has_many :through | Many-to-many via join model | Join table |
| has_and_belongs_to_many | Many-to-many direct | Join table |
| has_one :through | One-to-one via join model | Join table |
Common Query Methods
| Method | Purpose | Example |
|---|---|---|
| where | Filter records | where(published: true) |
| joins | Inner join tables | joins(:author) |
| includes | Eager load associations | includes(:comments) |
| preload | Separate query eager load | preload(:tags) |
| eager_load | LEFT JOIN eager load | eager_load(:author) |
| select | Specify columns | select(:id, :title) |
| order | Sort results | order(created_at: :desc) |
| limit | Limit results | limit(10) |
| offset | Skip records | offset(20) |
| group | Group aggregates | group(:author_id) |
| having | Filter groups | having('COUNT(*) > 5') |
| distinct | Remove duplicates | distinct |
| find_by | Find single record | find_by(email: 'user@example.com') |
| pluck | Extract column values | pluck(:id, :title) |
| exists? | Check record existence | exists?(id: 123) |
Transaction Isolation Levels
| Level | Description | Trade-offs |
|---|---|---|
| READ UNCOMMITTED | Allows dirty reads | Maximum concurrency, minimal consistency |
| READ COMMITTED | Prevents dirty reads | Default for most databases |
| REPEATABLE READ | Prevents non-repeatable reads | Higher consistency, reduced concurrency |
| SERIALIZABLE | Complete isolation | Maximum consistency, lowest concurrency |
Index Types
| Type | Best For | Limitations |
|---|---|---|
| B-tree | Range queries, sorting | Default choice for most columns |
| Hash | Exact equality matches | No range queries or sorting |
| GiST | Full-text, geometric data | Database-specific implementation |
| GIN | Array, JSON searches | PostgreSQL-specific |
| Partial | Subset of rows | Requires WHERE clause match |
| Unique | Enforcing uniqueness | Additional constraint validation |
| Composite | Multi-column queries | Column order matters |
Migration Commands
| Command | Purpose | Example |
|---|---|---|
| create_table | Create new table | create_table :users |
| drop_table | Remove table | drop_table :users |
| add_column | Add column | add_column :users, :name, :string |
| remove_column | Delete column | remove_column :users, :age |
| rename_column | Rename column | rename_column :users, :name, :full_name |
| change_column | Modify column | change_column :users, :age, :bigint |
| add_index | Create index | add_index :users, :email, unique: true |
| remove_index | Delete index | remove_index :users, :email |
| add_foreign_key | Add FK constraint | add_foreign_key :orders, :users |
| add_reference | Add FK column and index | add_reference :orders, :user |
ACID Transaction Properties
| Property | Guarantee | Implementation |
|---|---|---|
| Atomicity | All or nothing execution | Transaction rollback on failure |
| Consistency | Valid state transitions | Constraint enforcement |
| Isolation | Concurrent transaction separation | Locking mechanisms |
| Durability | Persistent changes | Write-ahead logging |
Normalization Forms
| Form | Requirement | Eliminates |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Duplicate columns |
| 2NF | 1NF plus no partial dependencies | Partial key dependencies |
| 3NF | 2NF plus no transitive dependencies | Transitive dependencies |
| BCNF | 3NF plus determinant is superkey | Certain 3NF anomalies |
| 4NF | BCNF plus no multi-valued dependencies | Independent multi-valued facts |
Common Validation Constraints
| Constraint | Purpose | Example |
|---|---|---|
| NOT NULL | Require value | null: false |
| UNIQUE | Prevent duplicates | unique: true |
| CHECK | Custom conditions | check_constraint 'age > 0' |
| FOREIGN KEY | Referential integrity | foreign_key: true |
| PRIMARY KEY | Unique identifier | primary_key: true |
| DEFAULT | Automatic value | default: 0 |