Overview
Document databases store data as semi-structured documents, typically in JSON, BSON, or XML formats. Each document contains key-value pairs where values can include strings, numbers, arrays, nested objects, or other documents. Unlike relational databases that require predefined schemas and normalize data across multiple tables, document databases embed related data within a single document structure.
The document model maps directly to objects in programming languages, eliminating the object-relational impedance mismatch. A document representing a blog post might contain the post content, author information, comments, and tags as nested structures within one document. This contrasts with relational databases where the same data spans multiple tables connected through foreign keys.
Document databases originated from the need to handle semi-structured data at scale. MongoDB emerged in 2009, CouchDB in 2005, and these systems addressed limitations in relational databases when dealing with rapidly changing schemas and hierarchical data. The CAP theorem influences document database design, with different systems prioritizing consistency, availability, or partition tolerance.
# Document representation of a blog post
{
_id: "post_123",
title: "Understanding Document Databases",
author: {
name: "Jane Developer",
email: "jane@example.com",
profile: { bio: "Software engineer", location: "San Francisco" }
},
content: "Document databases provide...",
tags: ["databases", "nosql", "architecture"],
comments: [
{ user: "john", text: "Great article", timestamp: "2024-01-15T10:30:00Z" },
{ user: "alice", text: "Very informative", timestamp: "2024-01-15T11:45:00Z" }
],
created_at: "2024-01-15T09:00:00Z",
updated_at: "2024-01-15T12:00:00Z"
}
Document databases fit applications with variable structure, hierarchical relationships, or frequent schema changes. E-commerce catalogs where products have different attributes, content management systems with diverse content types, and user profile systems with varying fields all benefit from document storage. The database handles schema evolution without migrations, as each document can have different fields.
Key Principles
Document databases organize data around documents rather than rows. A document serves as the atomic unit, containing all information for a single entity. The database stores documents in collections (similar to tables), but collections do not enforce schema constraints. Two documents in the same collection can have completely different structures.
Each document requires a unique identifier, typically stored in an _id field. The database generates this identifier automatically if not provided. The identifier ensures document retrieval operates efficiently, as the database indexes all _id fields by default. Documents can reference other documents through identifiers, creating relationships similar to foreign keys, though the database does not enforce referential integrity.
Embedded documents represent one-to-one and one-to-many relationships within parent documents. An order document embeds line items, customer information, and shipping details. This denormalization reduces join operations and improves read performance, as the database retrieves all related data in a single query. The trade-off involves data duplication and increased storage requirements.
# Embedded relationship - order contains line items
{
_id: "order_456",
customer_id: "cust_789",
order_date: "2024-01-15",
line_items: [
{ product: "Widget A", quantity: 2, price: 29.99 },
{ product: "Widget B", quantity: 1, price: 49.99 }
],
shipping: {
address: "123 Main St",
city: "Portland",
state: "OR",
zip: "97201"
},
total: 109.97
}
Document references represent many-to-many relationships or situations where embedding causes excessive duplication. A blog post references author documents rather than embedding full author details in every post. The application performs manual joins by querying referenced documents separately.
# Referenced relationship - post references author
{
_id: "post_789",
title: "Database Design Patterns",
author_id: "author_123", # Reference to author document
content: "When designing schemas...",
category_ids: ["cat_1", "cat_2"] # References to category documents
}
Indexes optimize query performance by creating data structures that map field values to document locations. The database supports single-field indexes on individual fields, compound indexes spanning multiple fields, and specialized indexes for arrays, geospatial data, and text search. Index creation blocks database operations in some systems, requiring careful planning for large collections.
Schema validation allows enforcing constraints on document structure despite the flexible schema model. Validation rules specify required fields, data types, value ranges, and custom validation logic. These rules prevent invalid data entry while maintaining flexibility for legitimate schema variations.
Atomic operations apply at the document level. Updates to a single document execute atomically, ensuring either all changes succeed or all fail. Multi-document transactions, when supported, provide ACID guarantees across multiple documents or collections, though with performance implications. Not all document databases support multi-document transactions, as distributed systems prioritize availability over consistency.
Ruby Implementation
Ruby applications interact with document databases through database-specific drivers and Object-Document Mappers (ODMs). The MongoDB driver provides low-level access to database operations, while Mongoid and MongoMapper offer ActiveRecord-style interfaces for working with documents as Ruby objects.
The MongoDB Ruby driver handles connection management, query execution, and result parsing. Applications configure connections with database URLs, specifying hosts, ports, authentication credentials, and options:
require 'mongo'
# Configure MongoDB client
client = Mongo::Client.new(
['localhost:27017'],
database: 'myapp_development',
server_selection_timeout: 5
)
# Access a collection
posts = client[:posts]
# Insert a document
result = posts.insert_one({
title: 'First Post',
content: 'This is my first blog post.',
author: { name: 'Alice', email: 'alice@example.com' },
tags: ['intro', 'welcome'],
created_at: Time.now
})
# Inserted document ID
puts result.inserted_id
# => BSON::ObjectId('65a2f3b8c9d0e12345678901')
Mongoid provides an ODM that maps Ruby classes to MongoDB collections. Class definitions specify fields, types, relationships, validations, and callbacks. Mongoid handles serialization, type conversion, and query generation:
require 'mongoid'
# Configure Mongoid
Mongoid.configure do |config|
config.clients.default = {
hosts: ['localhost:27017'],
database: 'myapp_development'
}
end
# Define document models
class Post
include Mongoid::Document
include Mongoid::Timestamps
field :title, type: String
field :content, type: String
field :tags, type: Array, default: []
field :view_count, type: Integer, default: 0
embeds_one :metadata
embeds_many :comments
belongs_to :author, class_name: 'User'
validates :title, presence: true, length: { minimum: 5 }
validates :content, presence: true
index({ title: 'text', content: 'text' })
index({ created_at: -1 })
end
class Comment
include Mongoid::Document
field :author_name, type: String
field :text, type: String
field :created_at, type: Time, default: -> { Time.now }
embedded_in :post
validates :author_name, :text, presence: true
end
class Metadata
include Mongoid::Document
field :seo_title, type: String
field :seo_description, type: String
field :featured, type: Boolean, default: false
embedded_in :post
end
Creating and querying documents follows ActiveRecord conventions:
# Create a post with embedded documents
post = Post.create!(
title: 'Understanding NoSQL',
content: 'NoSQL databases offer flexible schemas...',
tags: ['databases', 'nosql'],
author: current_user,
metadata: Metadata.new(
seo_title: 'NoSQL Database Guide',
featured: true
)
)
# Add comments
post.comments.create!(
author_name: 'Bob',
text: 'Great explanation!'
)
# Query posts
recent_posts = Post.where(:created_at.gte => 1.week.ago)
.desc(:created_at)
.limit(10)
# Text search
search_results = Post.text_search('schema design')
.order_by(score: { '$meta': 'textScore' })
# Aggregation
Post.collection.aggregate([
{ '$match': { 'tags': 'databases' } },
{ '$group': { _id: '$author_id', count: { '$sum': 1 } } },
{ '$sort': { count: -1 } },
{ '$limit': 5 }
])
Embedded associations store related documents within parent documents. Mongoid provides methods to access, create, and update embedded documents:
# Access embedded documents
post.comments.each do |comment|
puts "#{comment.author_name}: #{comment.text}"
end
# Query embedded documents
post.comments.where(author_name: 'Bob')
# Update embedded document
comment = post.comments.first
comment.update(text: 'Updated comment text')
# Remove embedded document
post.comments.find('comment_id').destroy
Referenced associations require manual querying or eager loading to avoid N+1 queries:
# Define referenced association
class User
include Mongoid::Document
field :name, type: String
field :email, type: String
has_many :posts
end
class Post
belongs_to :author, class_name: 'User', foreign_key: 'author_id'
end
# Eager load associations
posts = Post.includes(:author).limit(20)
posts.each do |post|
puts "#{post.title} by #{post.author.name}" # No additional queries
end
Atomic operations ensure data consistency during concurrent updates:
# Atomic increment
post.inc(view_count: 1)
# Atomic push to array
post.add_to_set(tags: 'featured')
# Atomic remove from array
post.pull(tags: 'draft')
# Atomic update with conditions
Post.where(id: post.id, view_count: 100).update(status: 'popular')
Design Considerations
Selecting between document databases and relational databases depends on data structure, access patterns, consistency requirements, and scalability needs. Document databases excel when data exhibits hierarchical structure, variable schemas, or aggregate-oriented access patterns. Relational databases perform better for highly normalized data with complex relationships and strong consistency requirements.
Data with natural hierarchical structure fits document storage. A product catalog where each product has different attributes based on category benefits from flexible schemas. Electronics products have specifications like screen size and processor type, while clothing products have size, color, and material. Storing these as documents avoids sparse columns or entity-attribute-value patterns:
# Electronics product
{
_id: "prod_1",
name: "Laptop Model X",
category: "electronics",
specifications: {
screen_size: "15.6 inches",
processor: "Intel Core i7",
ram: "16GB",
storage: "512GB SSD"
},
price: 1299.99
}
# Clothing product
{
_id: "prod_2",
name: "Cotton T-Shirt",
category: "clothing",
specifications: {
size: "M",
color: "Blue",
material: "100% Cotton",
care_instructions: "Machine wash cold"
},
price: 24.99
}
Applications that read entire entity aggregates benefit from embedded documents. A shopping cart retrieves all items, customer details, and pricing in one query. Relational databases require joins across multiple tables, increasing query complexity and latency. Document databases return complete aggregates in single queries.
Frequent schema evolution favors document databases. Startups iterating on product features add and remove fields without schema migrations. Each document version coexists in the same collection, with application code handling different document structures. Relational databases require ALTER TABLE statements that lock tables and disrupt operations.
Document databases sacrifice transactional guarantees across multiple documents or collections. Banking applications requiring atomic transfers between accounts need multi-document transactions or relational databases. Document databases supporting multi-document transactions impose performance penalties, as distributed coordination overhead negates some benefits of denormalization.
Relationship cardinality influences embedding versus referencing decisions. One-to-one and one-to-many relationships with few related items embed efficiently. One-to-many relationships with unbounded growth require references to prevent document size limits. Many-to-many relationships require references, as embedding causes duplication and update anomalies:
# Embed: One-to-few (blog post with comments, assuming limited comments)
{
_id: "post_1",
title: "Database Patterns",
comments: [
{ user: "alice", text: "Helpful article" },
{ user: "bob", text: "Thanks for sharing" }
]
}
# Reference: One-to-many unbounded (author with posts)
# Author document
{ _id: "author_1", name: "Jane Smith" }
# Post documents reference author
{ _id: "post_1", title: "First Post", author_id: "author_1" }
{ _id: "post_2", title: "Second Post", author_id: "author_1" }
# Reference: Many-to-many (products and categories)
# Product document
{ _id: "prod_1", name: "Widget", category_ids: ["cat_1", "cat_2"] }
# Category documents
{ _id: "cat_1", name: "Electronics" }
{ _id: "cat_2", name: "Gadgets" }
Write patterns affect schema design. Applications with read-heavy workloads denormalize data for query performance, accepting duplication and potential inconsistency. Write-heavy workloads minimize duplication to reduce update costs. Analytical workloads that scan large datasets benefit from document databases with columnar storage or relational databases optimized for analytics.
Document size impacts performance and storage. Documents exceeding 16MB in MongoDB hit size limits. Large documents cause memory pressure during queries that load entire documents. Applications split large content across multiple documents or store large binary data in GridFS:
# Split large content across documents
{
_id: "article_1",
title: "Complete Guide to Databases",
metadata: { author: "Jane", published: "2024-01-15" },
sections: [
{ order: 1, heading: "Introduction", content_ref: "section_1" },
{ order: 2, heading: "Concepts", content_ref: "section_2" }
]
}
# Content stored in separate documents
{ _id: "section_1", article_id: "article_1", content: "..." }
{ _id: "section_2", article_id: "article_1", content: "..." }
Tools & Ecosystem
MongoDB dominates the document database market, offering comprehensive features, mature tooling, and extensive language support. MongoDB Atlas provides managed cloud hosting with automated backups, monitoring, and scaling. The MongoDB Ruby driver and Mongoid ODM enable Ruby applications to interact with MongoDB clusters.
CouchDB takes a different approach, emphasizing HTTP APIs, conflict-free replication, and offline-first applications. CouchDB stores documents as JSON and exposes all operations through RESTful HTTP. Map-reduce views provide indexing and querying. The CouchRest Ruby gem wraps CouchDB's HTTP API:
require 'couchrest'
# Connect to CouchDB
db = CouchRest.database!("http://localhost:5984/myapp")
# Create a document
response = db.save_doc({
type: 'post',
title: 'CouchDB Example',
content: 'Document content here',
created_at: Time.now.to_s
})
# Retrieve document
doc = db.get(response['id'])
# Update document
doc['content'] = 'Updated content'
db.save_doc(doc)
# Define a view
db.save_doc({
_id: '_design/posts',
views: {
by_date: {
map: "function(doc) { if(doc.type == 'post') emit(doc.created_at, doc); }"
}
}
})
# Query view
results = db.view('posts/by_date')
Amazon DocumentDB provides MongoDB-compatible database service on AWS. DocumentDB implements the MongoDB wire protocol, allowing existing MongoDB applications and tools to connect with minimal changes. DocumentDB separates storage and compute, automatically scaling storage and supporting read replicas for high availability.
PostgreSQL added JSON and JSONB data types, enabling document storage within a relational database. JSONB stores documents in binary format, supporting indexing and efficient querying. Applications use PostgreSQL for both relational and document data:
require 'pg'
conn = PG.connect(dbname: 'myapp')
# Create table with JSONB column
conn.exec("CREATE TABLE posts (
id SERIAL PRIMARY KEY,
data JSONB NOT NULL
)")
# Insert document
conn.exec_params(
"INSERT INTO posts (data) VALUES ($1)",
[{ title: 'First Post', tags: ['ruby', 'databases'] }.to_json]
)
# Query JSONB field
result = conn.exec("SELECT data->>'title' AS title FROM posts
WHERE data @> '{\"tags\": [\"ruby\"]}'")
result.each { |row| puts row['title'] }
# Create index on JSONB field
conn.exec("CREATE INDEX idx_posts_tags ON posts USING GIN ((data->'tags'))")
Mongoid provides the primary ODM for Ruby MongoDB applications. Mongoid 8.x supports MongoDB 4.x through 7.x, providing ActiveRecord-like APIs, association management, validation, callbacks, and query DSL. Mongoid integrates with Rails, supporting generators, rake tasks, and configuration conventions:
# Gemfile
gem 'mongoid', '~> 8.0'
# config/mongoid.yml
development:
clients:
default:
database: myapp_development
hosts:
- localhost:27017
options:
server_selection_timeout: 5
# app/models/post.rb
class Post
include Mongoid::Document
include Mongoid::Timestamps
field :title, type: String
field :content, type: String
validates :title, presence: true
scope :recent, -> { where(:created_at.gte => 1.week.ago) }
end
MongoMapper offers an alternative ODM with similar features. MongoMapper provides a lighter-weight implementation and different API choices. Applications choose based on team preference and specific feature requirements.
Studio 3T and MongoDB Compass provide GUI tools for database management. These tools enable visual query building, index management, schema analysis, and data import/export. Compass includes aggregation pipeline builders and query performance visualization.
Performance Considerations
Index strategy determines query performance in document databases. Queries that scan entire collections without indexes execute slowly, examining every document. Indexes create data structures that map field values to document locations, enabling direct lookups. The database consults indexes to locate matching documents, then retrieves those documents from storage.
Single-field indexes optimize queries filtering on one field. Creating an index on the email field enables efficient user lookups by email address. Compound indexes spanning multiple fields support queries filtering on field combinations:
# Single-field index
Post.index({ created_at: -1 }) # Descending order for recent posts
# Compound index
Post.index({ author_id: 1, created_at: -1 }) # Author posts by date
# Query uses compound index
Post.where(author_id: user.id).desc(:created_at).limit(10)
Index prefix matching allows compound indexes to support queries on leading fields. An index on {category: 1, price: 1, rating: 1} supports queries on category alone, category and price, or all three fields. Queries filtering only on price or rating cannot use this index, requiring separate indexes.
Covered queries retrieve all data from indexes without accessing documents. Queries projecting only indexed fields execute faster, as the database returns values directly from index entries:
# Query requires document access
posts = Post.where(author_id: author.id).only(:title, :content, :created_at)
# Covered query (assuming index on author_id and title)
posts = Post.where(author_id: author.id).only(:_id, :title)
Array indexes create index entries for each array element. An index on the tags field creates multiple index entries per document. Queries checking array membership use array indexes efficiently:
Post.index({ tags: 1 })
# Query finds posts with tag 'databases'
Post.where(tags: 'databases')
# Query finds posts with all specified tags
Post.where(tags: { '$all': ['databases', 'nosql'] })
Text indexes enable full-text search across string fields. Text indexes tokenize strings, remove stop words, and stem words to base forms. Text search queries match tokenized and stemmed terms:
Post.index({ title: 'text', content: 'text' })
# Text search
results = Post.text_search('document database schema')
.order_by(score: { '$meta': 'textScore' })
Geospatial indexes support location-based queries. 2dsphere indexes handle coordinates on sphere geometry, enabling queries for documents near a point or within regions:
class Store
include Mongoid::Document
field :name, type: String
field :location, type: Array # [longitude, latitude]
index({ location: '2dsphere' })
end
# Find stores within 5km
Store.geo_near([longitude, latitude]).max_distance(5000)
Query selectivity impacts index effectiveness. Queries filtering to small result sets benefit most from indexes. Queries matching most documents gain less from indexing, as the database still loads many documents. An index on a boolean field with 50/50 distribution provides less benefit than an index on email with unique values.
Indexes impose storage overhead and slow write operations. Each index requires storage space and updates during document insertions, updates, and deletions. Applications balance query performance against write performance and storage costs. Unnecessary indexes waste resources and degrade write performance.
Document size affects memory usage and I/O performance. Large documents consume more cache memory, reducing the number of documents fitting in RAM. Queries loading large documents spend more time on I/O. Applications minimize document size by splitting large fields, using references instead of embedding, or storing binary data separately.
Connection pooling reduces connection overhead. Creating database connections involves authentication and setup costs. Connection pools maintain open connections that multiple threads or requests reuse. Mongoid configures connection pools through the max_pool_size option:
# config/mongoid.yml
production:
clients:
default:
database: myapp_production
hosts:
- mongo1.example.com:27017
- mongo2.example.com:27017
- mongo3.example.com:27017
options:
max_pool_size: 50
min_pool_size: 5
wait_queue_timeout: 5
Aggregation pipelines perform complex data transformations and analytics. Pipelines process documents through stages that filter, group, sort, and transform data. The database optimizes pipelines, sometimes combining stages or reordering operations for efficiency:
# Calculate average post length by author
Post.collection.aggregate([
{
'$project': {
author_id: 1,
content_length: { '$strLenCP': '$content' }
}
},
{
'$group': {
_id: '$author_id',
avg_length: { '$avg': '$content_length' },
post_count: { '$sum': 1 }
}
},
{
'$sort': { avg_length: -1 }
},
{
'$limit': 10
}
])
Integration & Interoperability
Ruby web applications integrate document databases through configuration, connection management, and ORM integration. Rails applications using Mongoid replace ActiveRecord with Mongoid, configuring database connections through YAML files and initializers.
Mongoid configuration specifies client settings, connection options, and model settings. Applications define multiple database clients for separating read and write workloads or connecting to different clusters:
# config/mongoid.yml
production:
clients:
default:
database: myapp_production
hosts:
- replica1.example.com:27017
- replica2.example.com:27017
- replica3.example.com:27017
options:
read:
mode: :secondary_preferred
write:
w: majority
wtimeout: 5000
max_pool_size: 100
analytics:
database: myapp_analytics
hosts:
- analytics.example.com:27017
options:
read:
mode: :secondary
Models specify clients for reading from different databases:
class AnalyticsEvent
include Mongoid::Document
store_in client: :analytics, collection: 'events'
field :event_type, type: String
field :user_id, type: String
field :timestamp, type: Time
field :properties, type: Hash
end
Migrating data between relational and document databases requires transforming normalized data into denormalized documents. Export scripts query relational data, combine related records, and insert documents:
# Export ActiveRecord models to MongoDB
ActiveRecord::Base.connection.execute("SELECT * FROM posts").each do |row|
comments = Comment.where(post_id: row['id']).map do |c|
{
author_name: c.author_name,
text: c.text,
created_at: c.created_at
}
end
Post.create!(
title: row['title'],
content: row['content'],
author_id: row['author_id'],
comments: comments,
created_at: row['created_at']
)
end
Polyglot persistence combines relational and document databases in single applications. User authentication and financial data reside in PostgreSQL for transactional guarantees, while content and session data use MongoDB for flexible schemas. Applications maintain connections to both databases:
# User authentication in PostgreSQL
class User < ActiveRecord::Base
has_many :orders
validates :email, presence: true, uniqueness: true
end
# Content in MongoDB
class Article
include Mongoid::Document
field :title, type: String
field :content, type: String
field :author_id, type: String
def author
User.find(author_id)
end
end
# Controller coordinates both systems
def show
@article = Article.find(params[:id])
@author = User.find(@article.author_id)
end
Message queues integrate document databases with event-driven architectures. Applications publish document changes to message queues, enabling downstream services to react to data changes. Change streams in MongoDB provide real-time notifications of document modifications:
# Watch for changes in posts collection
posts = Post.collection
change_stream = posts.watch
Thread.new do
change_stream.each do |change|
case change['operationType']
when 'insert'
document = change['fullDocument']
# Process new post
NotificationService.notify_new_post(document)
when 'update'
document_id = change['documentKey']['_id']
# Process update
CacheService.invalidate_post(document_id)
end
end
end
REST APIs expose document data to external clients. Controllers retrieve documents, serialize to JSON, and return responses. API versioning handles schema evolution across clients:
class Api::V1::PostsController < ApplicationController
def index
posts = Post.desc(:created_at).limit(20)
render json: posts.map { |p| serialize_post(p) }
end
def show
post = Post.find(params[:id])
render json: serialize_post(post)
end
private
def serialize_post(post)
{
id: post.id.to_s,
title: post.title,
content: post.content,
author: {
id: post.author_id.to_s,
name: post.author.name
},
tags: post.tags,
comment_count: post.comments.count,
created_at: post.created_at.iso8601
}
end
end
GraphQL interfaces provide flexible querying over document data. GraphQL resolvers fetch documents and related data based on query selections:
class Types::PostType < Types::BaseObject
field :id, ID, null: false
field :title, String, null: false
field :content, String, null: false
field :author, Types::UserType, null: false
field :comments, [Types::CommentType], null: false
def author
User.find(object.author_id)
end
end
class Types::QueryType < Types::BaseObject
field :post, Types::PostType, null: true do
argument :id, ID, required: true
end
def post(id:)
Post.find(id)
end
end
Reference
Document Database Comparison
| Database | Model | Query Language | Transactions | Replication | Key Features |
|---|---|---|---|---|---|
| MongoDB | BSON documents | MongoDB Query Language | Multi-document ACID | Replica sets | Rich query, aggregation, sharding |
| CouchDB | JSON documents | Map-reduce views | Single-document | Multi-master | HTTP API, offline sync, conflict resolution |
| DocumentDB | BSON documents | MongoDB-compatible | ACID | Automated replication | AWS managed, compatible with MongoDB |
| RavenDB | JSON documents | RQL | ACID | Master-master | ACID transactions, full-text search |
MongoDB Index Types
| Index Type | Syntax | Use Case | Limitations |
|---|---|---|---|
| Single field | {field: 1} | Queries on single field | One field only |
| Compound | {field1: 1, field2: -1} | Queries on multiple fields | Order matters for prefix matching |
| Text | {field: text} | Full-text search | One text index per collection |
| Geospatial | {location: 2dsphere} | Location queries | Requires GeoJSON format |
| Hashed | {field: hashed} | Even shard distribution | Cannot support range queries |
| TTL | {date: 1, expireAfterSeconds: N} | Automatic document expiration | Single field, Date type only |
| Partial | {field: 1, partialFilterExpression} | Index subset of documents | Query must include filter |
Mongoid Field Types
| Type | Ruby Class | Storage Format | Usage |
|---|---|---|---|
| String | String | UTF-8 string | Text data |
| Integer | Integer | 32-bit or 64-bit | Whole numbers |
| Float | Float | 64-bit float | Decimal numbers |
| Boolean | TrueClass, FalseClass | Boolean | True/false values |
| Date | Date | UTC datetime | Date without time |
| Time | Time | UTC datetime | Date and time |
| DateTime | DateTime | UTC datetime | Date and time |
| Array | Array | BSON array | Lists |
| Hash | Hash | BSON document | Key-value pairs |
| Range | Range | Hash with min and max | Numeric or date ranges |
| Regexp | Regexp | BSON regex | Regular expressions |
Query Operators
| Operator | Purpose | Example |
|---|---|---|
| eq | Equals | Post.where(status: 'published') |
| ne | Not equals | Post.where(:status.ne => 'draft') |
| gt | Greater than | Post.where(:views.gt => 1000) |
| gte | Greater than or equal | Post.where(:created_at.gte => 1.week.ago) |
| lt | Less than | Post.where(:score.lt => 50) |
| lte | Less than or equal | Post.where(:price.lte => 100) |
| in | In array | Post.where(:status.in => ['published', 'featured']) |
| nin | Not in array | Post.where(:status.nin => ['draft', 'archived']) |
| all | Contains all | Post.where(:tags.all => ['ruby', 'database']) |
| exists | Field exists | Post.where(:featured.exists => true) |
| regex | Regular expression | Post.where(title: /pattern/i) |
Aggregation Pipeline Stages
| Stage | Purpose | Example Usage |
|---|---|---|
| match | Filter documents | Filter by criteria before processing |
| group | Group by field | Calculate aggregates per group |
| project | Transform documents | Select or compute fields |
| sort | Order results | Sort by one or more fields |
| limit | Limit result count | Return top N results |
| skip | Skip documents | Implement pagination |
| lookup | Join collections | Manual join with another collection |
| unwind | Flatten arrays | Convert array field to separate documents |
| sample | Random sample | Get random documents |
Mongoid Association Types
| Association | Defines | Storage | Query Method |
|---|---|---|---|
| embeds_one | One embedded document | Within parent | parent.embedded_doc |
| embeds_many | Many embedded documents | Within parent as array | parent.embedded_docs |
| has_one | One referenced document | Separate collection | parent.referenced_doc |
| has_many | Many referenced documents | Separate collection | parent.referenced_docs |
| belongs_to | Parent reference | Foreign key in child | child.parent |
| has_and_belongs_to_many | Many-to-many | Foreign key arrays in both | model.related_models |
Connection Options
| Option | Type | Default | Purpose |
|---|---|---|---|
| max_pool_size | Integer | 5 | Maximum connections in pool |
| min_pool_size | Integer | 1 | Minimum connections maintained |
| wait_queue_timeout | Integer | 1 | Seconds to wait for connection |
| connect_timeout | Integer | 10 | Seconds to wait for connection establishment |
| socket_timeout | Integer | No timeout | Seconds to wait for socket operations |
| server_selection_timeout | Integer | 30 | Seconds to wait for server selection |
| heartbeat_frequency | Integer | 10 | Seconds between server health checks |
Write Concern Levels
| Level | Description | Durability | Performance |
|---|---|---|---|
| w: 0 | Unacknowledged | Lowest | Fastest |
| w: 1 | Acknowledge primary | Medium | Fast |
| w: majority | Majority of replica set | High | Slower |
| w: N | N replicas acknowledge | Configurable | Varies |
| j: true | Journaled to disk | Highest | Slowest |
Read Preference Modes
| Mode | Behavior | Use Case |
|---|---|---|
| primary | Read from primary only | Default, strongest consistency |
| primaryPreferred | Primary, fallback to secondary | High availability |
| secondary | Read from secondary only | Reduce primary load |
| secondaryPreferred | Secondary, fallback to primary | Balance load |
| nearest | Lowest network latency | Geographically distributed reads |