CrackedRuby - Document Databases Concepts

Overview

Document databases store data as semi-structured documents, typically in JSON, BSON, or XML formats. Each document contains key-value pairs where values can include strings, numbers, arrays, nested objects, or other documents. Unlike relational databases that require predefined schemas and normalize data across multiple tables, document databases embed related data within a single document structure.

The document model maps directly to objects in programming languages, eliminating the object-relational impedance mismatch. A document representing a blog post might contain the post content, author information, comments, and tags as nested structures within one document. This contrasts with relational databases where the same data spans multiple tables connected through foreign keys.

Document databases originated from the need to handle semi-structured data at scale. MongoDB emerged in 2009, CouchDB in 2005, and these systems addressed limitations in relational databases when dealing with rapidly changing schemas and hierarchical data. The CAP theorem influences document database design, with different systems prioritizing consistency, availability, or partition tolerance.

# Document representation of a blog post
{
  _id: "post_123",
  title: "Understanding Document Databases",
  author: {
    name: "Jane Developer",
    email: "jane@example.com",
    profile: { bio: "Software engineer", location: "San Francisco" }
  },
  content: "Document databases provide...",
  tags: ["databases", "nosql", "architecture"],
  comments: [
    { user: "john", text: "Great article", timestamp: "2024-01-15T10:30:00Z" },
    { user: "alice", text: "Very informative", timestamp: "2024-01-15T11:45:00Z" }
  ],
  created_at: "2024-01-15T09:00:00Z",
  updated_at: "2024-01-15T12:00:00Z"
}

Document databases fit applications with variable structure, hierarchical relationships, or frequent schema changes. E-commerce catalogs where products have different attributes, content management systems with diverse content types, and user profile systems with varying fields all benefit from document storage. The database handles schema evolution without migrations, as each document can have different fields.

Key Principles

Document databases organize data around documents rather than rows. A document serves as the atomic unit, containing all information for a single entity. The database stores documents in collections (similar to tables), but collections do not enforce schema constraints. Two documents in the same collection can have completely different structures.

Each document requires a unique identifier, typically stored in an _id field. The database generates this identifier automatically if not provided. The identifier ensures document retrieval operates efficiently, as the database indexes all _id fields by default. Documents can reference other documents through identifiers, creating relationships similar to foreign keys, though the database does not enforce referential integrity.

Embedded documents represent one-to-one and one-to-many relationships within parent documents. An order document embeds line items, customer information, and shipping details. This denormalization reduces join operations and improves read performance, as the database retrieves all related data in a single query. The trade-off involves data duplication and increased storage requirements.

# Embedded relationship - order contains line items
{
  _id: "order_456",
  customer_id: "cust_789",
  order_date: "2024-01-15",
  line_items: [
    { product: "Widget A", quantity: 2, price: 29.99 },
    { product: "Widget B", quantity: 1, price: 49.99 }
  ],
  shipping: {
    address: "123 Main St",
    city: "Portland",
    state: "OR",
    zip: "97201"
  },
  total: 109.97
}

Document references represent many-to-many relationships or situations where embedding causes excessive duplication. A blog post references author documents rather than embedding full author details in every post. The application performs manual joins by querying referenced documents separately.

# Referenced relationship - post references author
{
  _id: "post_789",
  title: "Database Design Patterns",
  author_id: "author_123",  # Reference to author document
  content: "When designing schemas...",
  category_ids: ["cat_1", "cat_2"]  # References to category documents
}

Indexes optimize query performance by creating data structures that map field values to document locations. The database supports single-field indexes on individual fields, compound indexes spanning multiple fields, and specialized indexes for arrays, geospatial data, and text search. Index creation blocks database operations in some systems, requiring careful planning for large collections.

Schema validation allows enforcing constraints on document structure despite the flexible schema model. Validation rules specify required fields, data types, value ranges, and custom validation logic. These rules prevent invalid data entry while maintaining flexibility for legitimate schema variations.

Atomic operations apply at the document level. Updates to a single document execute atomically, ensuring either all changes succeed or all fail. Multi-document transactions, when supported, provide ACID guarantees across multiple documents or collections, though with performance implications. Not all document databases support multi-document transactions, as distributed systems prioritize availability over consistency.

Ruby Implementation

Ruby applications interact with document databases through database-specific drivers and Object-Document Mappers (ODMs). The MongoDB driver provides low-level access to database operations, while Mongoid and MongoMapper offer ActiveRecord-style interfaces for working with documents as Ruby objects.

The MongoDB Ruby driver handles connection management, query execution, and result parsing. Applications configure connections with database URLs, specifying hosts, ports, authentication credentials, and options:

require 'mongo'

# Configure MongoDB client
client = Mongo::Client.new(
  ['localhost:27017'],
  database: 'myapp_development',
  server_selection_timeout: 5
)

# Access a collection
posts = client[:posts]

# Insert a document
result = posts.insert_one({
  title: 'First Post',
  content: 'This is my first blog post.',
  author: { name: 'Alice', email: 'alice@example.com' },
  tags: ['intro', 'welcome'],
  created_at: Time.now
})

# Inserted document ID
puts result.inserted_id
# => BSON::ObjectId('65a2f3b8c9d0e12345678901')

Mongoid provides an ODM that maps Ruby classes to MongoDB collections. Class definitions specify fields, types, relationships, validations, and callbacks. Mongoid handles serialization, type conversion, and query generation:

require 'mongoid'

# Configure Mongoid
Mongoid.configure do |config|
  config.clients.default = {
    hosts: ['localhost:27017'],
    database: 'myapp_development'
  }
end

# Define document models
class Post
  include Mongoid::Document
  include Mongoid::Timestamps
  
  field :title, type: String
  field :content, type: String
  field :tags, type: Array, default: []
  field :view_count, type: Integer, default: 0
  
  embeds_one :metadata
  embeds_many :comments
  belongs_to :author, class_name: 'User'
  
  validates :title, presence: true, length: { minimum: 5 }
  validates :content, presence: true
  
  index({ title: 'text', content: 'text' })
  index({ created_at: -1 })
end

class Comment
  include Mongoid::Document
  
  field :author_name, type: String
  field :text, type: String
  field :created_at, type: Time, default: -> { Time.now }
  
  embedded_in :post
  
  validates :author_name, :text, presence: true
end

class Metadata
  include Mongoid::Document
  
  field :seo_title, type: String
  field :seo_description, type: String
  field :featured, type: Boolean, default: false
  
  embedded_in :post
end

Creating and querying documents follows ActiveRecord conventions:

# Create a post with embedded documents
post = Post.create!(
  title: 'Understanding NoSQL',
  content: 'NoSQL databases offer flexible schemas...',
  tags: ['databases', 'nosql'],
  author: current_user,
  metadata: Metadata.new(
    seo_title: 'NoSQL Database Guide',
    featured: true
  )
)

# Add comments
post.comments.create!(
  author_name: 'Bob',
  text: 'Great explanation!'
)

# Query posts
recent_posts = Post.where(:created_at.gte => 1.week.ago)
                   .desc(:created_at)
                   .limit(10)

# Text search
search_results = Post.text_search('schema design')
                     .order_by(score: { '$meta': 'textScore' })

# Aggregation
Post.collection.aggregate([
  { '$match': { 'tags': 'databases' } },
  { '$group': { _id: '$author_id', count: { '$sum': 1 } } },
  { '$sort': { count: -1 } },
  { '$limit': 5 }
])

Embedded associations store related documents within parent documents. Mongoid provides methods to access, create, and update embedded documents:

# Access embedded documents
post.comments.each do |comment|
  puts "#{comment.author_name}: #{comment.text}"
end

# Query embedded documents
post.comments.where(author_name: 'Bob')

# Update embedded document
comment = post.comments.first
comment.update(text: 'Updated comment text')

# Remove embedded document
post.comments.find('comment_id').destroy

Referenced associations require manual querying or eager loading to avoid N+1 queries:

# Define referenced association
class User
  include Mongoid::Document
  
  field :name, type: String
  field :email, type: String
  
  has_many :posts
end

class Post
  belongs_to :author, class_name: 'User', foreign_key: 'author_id'
end

# Eager load associations
posts = Post.includes(:author).limit(20)
posts.each do |post|
  puts "#{post.title} by #{post.author.name}"  # No additional queries
end

Atomic operations ensure data consistency during concurrent updates:

# Atomic increment
post.inc(view_count: 1)

# Atomic push to array
post.add_to_set(tags: 'featured')

# Atomic remove from array
post.pull(tags: 'draft')

# Atomic update with conditions
Post.where(id: post.id, view_count: 100).update(status: 'popular')

Design Considerations

Selecting between document databases and relational databases depends on data structure, access patterns, consistency requirements, and scalability needs. Document databases excel when data exhibits hierarchical structure, variable schemas, or aggregate-oriented access patterns. Relational databases perform better for highly normalized data with complex relationships and strong consistency requirements.

Data with natural hierarchical structure fits document storage. A product catalog where each product has different attributes based on category benefits from flexible schemas. Electronics products have specifications like screen size and processor type, while clothing products have size, color, and material. Storing these as documents avoids sparse columns or entity-attribute-value patterns:

# Electronics product
{
  _id: "prod_1",
  name: "Laptop Model X",
  category: "electronics",
  specifications: {
    screen_size: "15.6 inches",
    processor: "Intel Core i7",
    ram: "16GB",
    storage: "512GB SSD"
  },
  price: 1299.99
}

# Clothing product
{
  _id: "prod_2",
  name: "Cotton T-Shirt",
  category: "clothing",
  specifications: {
    size: "M",
    color: "Blue",
    material: "100% Cotton",
    care_instructions: "Machine wash cold"
  },
  price: 24.99
}

Applications that read entire entity aggregates benefit from embedded documents. A shopping cart retrieves all items, customer details, and pricing in one query. Relational databases require joins across multiple tables, increasing query complexity and latency. Document databases return complete aggregates in single queries.

Frequent schema evolution favors document databases. Startups iterating on product features add and remove fields without schema migrations. Each document version coexists in the same collection, with application code handling different document structures. Relational databases require ALTER TABLE statements that lock tables and disrupt operations.

Document databases sacrifice transactional guarantees across multiple documents or collections. Banking applications requiring atomic transfers between accounts need multi-document transactions or relational databases. Document databases supporting multi-document transactions impose performance penalties, as distributed coordination overhead negates some benefits of denormalization.

Relationship cardinality influences embedding versus referencing decisions. One-to-one and one-to-many relationships with few related items embed efficiently. One-to-many relationships with unbounded growth require references to prevent document size limits. Many-to-many relationships require references, as embedding causes duplication and update anomalies:

# Embed: One-to-few (blog post with comments, assuming limited comments)
{
  _id: "post_1",
  title: "Database Patterns",
  comments: [
    { user: "alice", text: "Helpful article" },
    { user: "bob", text: "Thanks for sharing" }
  ]
}

# Reference: One-to-many unbounded (author with posts)
# Author document
{ _id: "author_1", name: "Jane Smith" }

# Post documents reference author
{ _id: "post_1", title: "First Post", author_id: "author_1" }
{ _id: "post_2", title: "Second Post", author_id: "author_1" }

# Reference: Many-to-many (products and categories)
# Product document
{ _id: "prod_1", name: "Widget", category_ids: ["cat_1", "cat_2"] }

# Category documents
{ _id: "cat_1", name: "Electronics" }
{ _id: "cat_2", name: "Gadgets" }

Write patterns affect schema design. Applications with read-heavy workloads denormalize data for query performance, accepting duplication and potential inconsistency. Write-heavy workloads minimize duplication to reduce update costs. Analytical workloads that scan large datasets benefit from document databases with columnar storage or relational databases optimized for analytics.

Document size impacts performance and storage. Documents exceeding 16MB in MongoDB hit size limits. Large documents cause memory pressure during queries that load entire documents. Applications split large content across multiple documents or store large binary data in GridFS:

# Split large content across documents
{
  _id: "article_1",
  title: "Complete Guide to Databases",
  metadata: { author: "Jane", published: "2024-01-15" },
  sections: [
    { order: 1, heading: "Introduction", content_ref: "section_1" },
    { order: 2, heading: "Concepts", content_ref: "section_2" }
  ]
}

# Content stored in separate documents
{ _id: "section_1", article_id: "article_1", content: "..." }
{ _id: "section_2", article_id: "article_1", content: "..." }

Tools & Ecosystem

MongoDB dominates the document database market, offering comprehensive features, mature tooling, and extensive language support. MongoDB Atlas provides managed cloud hosting with automated backups, monitoring, and scaling. The MongoDB Ruby driver and Mongoid ODM enable Ruby applications to interact with MongoDB clusters.

CouchDB takes a different approach, emphasizing HTTP APIs, conflict-free replication, and offline-first applications. CouchDB stores documents as JSON and exposes all operations through RESTful HTTP. Map-reduce views provide indexing and querying. The CouchRest Ruby gem wraps CouchDB's HTTP API:

require 'couchrest'

# Connect to CouchDB
db = CouchRest.database!("http://localhost:5984/myapp")

# Create a document
response = db.save_doc({
  type: 'post',
  title: 'CouchDB Example',
  content: 'Document content here',
  created_at: Time.now.to_s
})

# Retrieve document
doc = db.get(response['id'])

# Update document
doc['content'] = 'Updated content'
db.save_doc(doc)

# Define a view
db.save_doc({
  _id: '_design/posts',
  views: {
    by_date: {
      map: "function(doc) { if(doc.type == 'post') emit(doc.created_at, doc); }"
    }
  }
})

# Query view
results = db.view('posts/by_date')

Amazon DocumentDB provides MongoDB-compatible database service on AWS. DocumentDB implements the MongoDB wire protocol, allowing existing MongoDB applications and tools to connect with minimal changes. DocumentDB separates storage and compute, automatically scaling storage and supporting read replicas for high availability.

PostgreSQL added JSON and JSONB data types, enabling document storage within a relational database. JSONB stores documents in binary format, supporting indexing and efficient querying. Applications use PostgreSQL for both relational and document data:

require 'pg'

conn = PG.connect(dbname: 'myapp')

# Create table with JSONB column
conn.exec("CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  data JSONB NOT NULL
)")

# Insert document
conn.exec_params(
  "INSERT INTO posts (data) VALUES ($1)",
  [{ title: 'First Post', tags: ['ruby', 'databases'] }.to_json]
)

# Query JSONB field
result = conn.exec("SELECT data->>'title' AS title FROM posts 
                    WHERE data @> '{\"tags\": [\"ruby\"]}'")
result.each { |row| puts row['title'] }

# Create index on JSONB field
conn.exec("CREATE INDEX idx_posts_tags ON posts USING GIN ((data->'tags'))")

Mongoid provides the primary ODM for Ruby MongoDB applications. Mongoid 8.x supports MongoDB 4.x through 7.x, providing ActiveRecord-like APIs, association management, validation, callbacks, and query DSL. Mongoid integrates with Rails, supporting generators, rake tasks, and configuration conventions:

# Gemfile
gem 'mongoid', '~> 8.0'

# config/mongoid.yml
development:
  clients:
    default:
      database: myapp_development
      hosts:
        - localhost:27017
      options:
        server_selection_timeout: 5

# app/models/post.rb
class Post
  include Mongoid::Document
  include Mongoid::Timestamps
  
  field :title, type: String
  field :content, type: String
  
  validates :title, presence: true
  
  scope :recent, -> { where(:created_at.gte => 1.week.ago) }
end

MongoMapper offers an alternative ODM with similar features. MongoMapper provides a lighter-weight implementation and different API choices. Applications choose based on team preference and specific feature requirements.

Studio 3T and MongoDB Compass provide GUI tools for database management. These tools enable visual query building, index management, schema analysis, and data import/export. Compass includes aggregation pipeline builders and query performance visualization.

Performance Considerations

Index strategy determines query performance in document databases. Queries that scan entire collections without indexes execute slowly, examining every document. Indexes create data structures that map field values to document locations, enabling direct lookups. The database consults indexes to locate matching documents, then retrieves those documents from storage.

Single-field indexes optimize queries filtering on one field. Creating an index on the email field enables efficient user lookups by email address. Compound indexes spanning multiple fields support queries filtering on field combinations:

# Single-field index
Post.index({ created_at: -1 })  # Descending order for recent posts

# Compound index
Post.index({ author_id: 1, created_at: -1 })  # Author posts by date

# Query uses compound index
Post.where(author_id: user.id).desc(:created_at).limit(10)

Index prefix matching allows compound indexes to support queries on leading fields. An index on {category: 1, price: 1, rating: 1} supports queries on category alone, category and price, or all three fields. Queries filtering only on price or rating cannot use this index, requiring separate indexes.

Covered queries retrieve all data from indexes without accessing documents. Queries projecting only indexed fields execute faster, as the database returns values directly from index entries:

# Query requires document access
posts = Post.where(author_id: author.id).only(:title, :content, :created_at)

# Covered query (assuming index on author_id and title)
posts = Post.where(author_id: author.id).only(:_id, :title)

Array indexes create index entries for each array element. An index on the tags field creates multiple index entries per document. Queries checking array membership use array indexes efficiently:

Post.index({ tags: 1 })

# Query finds posts with tag 'databases'
Post.where(tags: 'databases')

# Query finds posts with all specified tags
Post.where(tags: { '$all': ['databases', 'nosql'] })

Text indexes enable full-text search across string fields. Text indexes tokenize strings, remove stop words, and stem words to base forms. Text search queries match tokenized and stemmed terms:

Post.index({ title: 'text', content: 'text' })

# Text search
results = Post.text_search('document database schema')
              .order_by(score: { '$meta': 'textScore' })

Geospatial indexes support location-based queries. 2dsphere indexes handle coordinates on sphere geometry, enabling queries for documents near a point or within regions:

class Store
  include Mongoid::Document
  
  field :name, type: String
  field :location, type: Array  # [longitude, latitude]
  
  index({ location: '2dsphere' })
end

# Find stores within 5km
Store.geo_near([longitude, latitude]).max_distance(5000)

Query selectivity impacts index effectiveness. Queries filtering to small result sets benefit most from indexes. Queries matching most documents gain less from indexing, as the database still loads many documents. An index on a boolean field with 50/50 distribution provides less benefit than an index on email with unique values.

Indexes impose storage overhead and slow write operations. Each index requires storage space and updates during document insertions, updates, and deletions. Applications balance query performance against write performance and storage costs. Unnecessary indexes waste resources and degrade write performance.

Document size affects memory usage and I/O performance. Large documents consume more cache memory, reducing the number of documents fitting in RAM. Queries loading large documents spend more time on I/O. Applications minimize document size by splitting large fields, using references instead of embedding, or storing binary data separately.

Connection pooling reduces connection overhead. Creating database connections involves authentication and setup costs. Connection pools maintain open connections that multiple threads or requests reuse. Mongoid configures connection pools through the max_pool_size option:

# config/mongoid.yml
production:
  clients:
    default:
      database: myapp_production
      hosts:
        - mongo1.example.com:27017
        - mongo2.example.com:27017
        - mongo3.example.com:27017
      options:
        max_pool_size: 50
        min_pool_size: 5
        wait_queue_timeout: 5

Aggregation pipelines perform complex data transformations and analytics. Pipelines process documents through stages that filter, group, sort, and transform data. The database optimizes pipelines, sometimes combining stages or reordering operations for efficiency:

# Calculate average post length by author
Post.collection.aggregate([
  {
    '$project': {
      author_id: 1,
      content_length: { '$strLenCP': '$content' }
    }
  },
  {
    '$group': {
      _id: '$author_id',
      avg_length: { '$avg': '$content_length' },
      post_count: { '$sum': 1 }
    }
  },
  {
    '$sort': { avg_length: -1 }
  },
  {
    '$limit': 10
  }
])

Integration & Interoperability

Ruby web applications integrate document databases through configuration, connection management, and ORM integration. Rails applications using Mongoid replace ActiveRecord with Mongoid, configuring database connections through YAML files and initializers.

Mongoid configuration specifies client settings, connection options, and model settings. Applications define multiple database clients for separating read and write workloads or connecting to different clusters:

# config/mongoid.yml
production:
  clients:
    default:
      database: myapp_production
      hosts:
        - replica1.example.com:27017
        - replica2.example.com:27017
        - replica3.example.com:27017
      options:
        read:
          mode: :secondary_preferred
        write:
          w: majority
          wtimeout: 5000
        max_pool_size: 100
        
    analytics:
      database: myapp_analytics
      hosts:
        - analytics.example.com:27017
      options:
        read:
          mode: :secondary

Models specify clients for reading from different databases:

class AnalyticsEvent
  include Mongoid::Document
  
  store_in client: :analytics, collection: 'events'
  
  field :event_type, type: String
  field :user_id, type: String
  field :timestamp, type: Time
  field :properties, type: Hash
end

Migrating data between relational and document databases requires transforming normalized data into denormalized documents. Export scripts query relational data, combine related records, and insert documents:

# Export ActiveRecord models to MongoDB
ActiveRecord::Base.connection.execute("SELECT * FROM posts").each do |row|
  comments = Comment.where(post_id: row['id']).map do |c|
    {
      author_name: c.author_name,
      text: c.text,
      created_at: c.created_at
    }
  end
  
  Post.create!(
    title: row['title'],
    content: row['content'],
    author_id: row['author_id'],
    comments: comments,
    created_at: row['created_at']
  )
end

Polyglot persistence combines relational and document databases in single applications. User authentication and financial data reside in PostgreSQL for transactional guarantees, while content and session data use MongoDB for flexible schemas. Applications maintain connections to both databases:

# User authentication in PostgreSQL
class User < ActiveRecord::Base
  has_many :orders
  validates :email, presence: true, uniqueness: true
end

# Content in MongoDB
class Article
  include Mongoid::Document
  
  field :title, type: String
  field :content, type: String
  field :author_id, type: String
  
  def author
    User.find(author_id)
  end
end

# Controller coordinates both systems
def show
  @article = Article.find(params[:id])
  @author = User.find(@article.author_id)
end

Message queues integrate document databases with event-driven architectures. Applications publish document changes to message queues, enabling downstream services to react to data changes. Change streams in MongoDB provide real-time notifications of document modifications:

# Watch for changes in posts collection
posts = Post.collection
change_stream = posts.watch

Thread.new do
  change_stream.each do |change|
    case change['operationType']
    when 'insert'
      document = change['fullDocument']
      # Process new post
      NotificationService.notify_new_post(document)
    when 'update'
      document_id = change['documentKey']['_id']
      # Process update
      CacheService.invalidate_post(document_id)
    end
  end
end

REST APIs expose document data to external clients. Controllers retrieve documents, serialize to JSON, and return responses. API versioning handles schema evolution across clients:

class Api::V1::PostsController < ApplicationController
  def index
    posts = Post.desc(:created_at).limit(20)
    render json: posts.map { |p| serialize_post(p) }
  end
  
  def show
    post = Post.find(params[:id])
    render json: serialize_post(post)
  end
  
  private
  
  def serialize_post(post)
    {
      id: post.id.to_s,
      title: post.title,
      content: post.content,
      author: {
        id: post.author_id.to_s,
        name: post.author.name
      },
      tags: post.tags,
      comment_count: post.comments.count,
      created_at: post.created_at.iso8601
    }
  end
end

GraphQL interfaces provide flexible querying over document data. GraphQL resolvers fetch documents and related data based on query selections:

class Types::PostType < Types::BaseObject
  field :id, ID, null: false
  field :title, String, null: false
  field :content, String, null: false
  field :author, Types::UserType, null: false
  field :comments, [Types::CommentType], null: false
  
  def author
    User.find(object.author_id)
  end
end

class Types::QueryType < Types::BaseObject
  field :post, Types::PostType, null: true do
    argument :id, ID, required: true
  end
  
  def post(id:)
    Post.find(id)
  end
end

Reference

Document Database Comparison

Database	Model	Query Language	Transactions	Replication	Key Features
MongoDB	BSON documents	MongoDB Query Language	Multi-document ACID	Replica sets	Rich query, aggregation, sharding
CouchDB	JSON documents	Map-reduce views	Single-document	Multi-master	HTTP API, offline sync, conflict resolution
DocumentDB	BSON documents	MongoDB-compatible	ACID	Automated replication	AWS managed, compatible with MongoDB
RavenDB	JSON documents	RQL	ACID	Master-master	ACID transactions, full-text search

MongoDB Index Types

Index Type	Syntax	Use Case	Limitations
Single field	{field: 1}	Queries on single field	One field only
Compound	{field1: 1, field2: -1}	Queries on multiple fields	Order matters for prefix matching
Text	{field: text}	Full-text search	One text index per collection
Geospatial	{location: 2dsphere}	Location queries	Requires GeoJSON format
Hashed	{field: hashed}	Even shard distribution	Cannot support range queries
TTL	{date: 1, expireAfterSeconds: N}	Automatic document expiration	Single field, Date type only
Partial	{field: 1, partialFilterExpression}	Index subset of documents	Query must include filter

Mongoid Field Types

Type	Ruby Class	Storage Format	Usage
String	String	UTF-8 string	Text data
Integer	Integer	32-bit or 64-bit	Whole numbers
Float	Float	64-bit float	Decimal numbers
Boolean	TrueClass, FalseClass	Boolean	True/false values
Date	Date	UTC datetime	Date without time
Time	Time	UTC datetime	Date and time
DateTime	DateTime	UTC datetime	Date and time
Array	Array	BSON array	Lists
Hash	Hash	BSON document	Key-value pairs
Range	Range	Hash with min and max	Numeric or date ranges
Regexp	Regexp	BSON regex	Regular expressions

Query Operators

Operator	Purpose	Example
eq	Equals	Post.where(status: 'published')
ne	Not equals	Post.where(:status.ne => 'draft')
gt	Greater than	Post.where(:views.gt => 1000)
gte	Greater than or equal	Post.where(:created_at.gte => 1.week.ago)
lt	Less than	Post.where(:score.lt => 50)
lte	Less than or equal	Post.where(:price.lte => 100)
in	In array	Post.where(:status.in => ['published', 'featured'])
nin	Not in array	Post.where(:status.nin => ['draft', 'archived'])
all	Contains all	Post.where(:tags.all => ['ruby', 'database'])
exists	Field exists	Post.where(:featured.exists => true)
regex	Regular expression	Post.where(title: /pattern/i)

Aggregation Pipeline Stages

Stage	Purpose	Example Usage
match	Filter documents	Filter by criteria before processing
group	Group by field	Calculate aggregates per group
project	Transform documents	Select or compute fields
sort	Order results	Sort by one or more fields
limit	Limit result count	Return top N results
skip	Skip documents	Implement pagination
lookup	Join collections	Manual join with another collection
unwind	Flatten arrays	Convert array field to separate documents
sample	Random sample	Get random documents

Mongoid Association Types

Association	Defines	Storage	Query Method
embeds_one	One embedded document	Within parent	parent.embedded_doc
embeds_many	Many embedded documents	Within parent as array	parent.embedded_docs
has_one	One referenced document	Separate collection	parent.referenced_doc
has_many	Many referenced documents	Separate collection	parent.referenced_docs
belongs_to	Parent reference	Foreign key in child	child.parent
has_and_belongs_to_many	Many-to-many	Foreign key arrays in both	model.related_models

Connection Options

Option	Type	Default	Purpose
max_pool_size	Integer	5	Maximum connections in pool
min_pool_size	Integer	1	Minimum connections maintained
wait_queue_timeout	Integer	1	Seconds to wait for connection
connect_timeout	Integer	10	Seconds to wait for connection establishment
socket_timeout	Integer	No timeout	Seconds to wait for socket operations
server_selection_timeout	Integer	30	Seconds to wait for server selection
heartbeat_frequency	Integer	10	Seconds between server health checks

Write Concern Levels

Level	Description	Durability	Performance
w: 0	Unacknowledged	Lowest	Fastest
w: 1	Acknowledge primary	Medium	Fast
w: majority	Majority of replica set	High	Slower
w: N	N replicas acknowledge	Configurable	Varies
j: true	Journaled to disk	Highest	Slowest

Read Preference Modes

Mode	Behavior	Use Case
primary	Read from primary only	Default, strongest consistency
primaryPreferred	Primary, fallback to secondary	High availability
secondary	Read from secondary only	Reduce primary load
secondaryPreferred	Secondary, fallback to primary	Balance load
nearest	Lowest network latency	Geographically distributed reads

Document Databases Concepts