Overview
NoSQL databases represent a category of database management systems that store and retrieve data using models other than the tabular relations found in relational databases. The term encompasses document stores, key-value stores, column-family stores, and graph databases. Each NoSQL database type addresses specific data storage and retrieval patterns that traditional relational databases handle inefficiently or cannot support at scale.
NoSQL databases emerged from the need to handle web-scale data volumes, high-velocity data streams, and flexible schema requirements that relational databases struggled to accommodate. Organizations like Google, Amazon, and Facebook developed internal NoSQL systems to handle billions of users and petabytes of data, leading to open-source implementations that became available to the broader development community.
The primary distinction between SQL and NoSQL databases lies in data modeling and consistency guarantees. SQL databases enforce strict schemas and ACID transactions across all operations. NoSQL databases trade some consistency guarantees for availability and partition tolerance, following the CAP theorem. This trade-off makes NoSQL databases suitable for distributed systems requiring horizontal scaling.
NoSQL databases excel in scenarios involving unstructured or semi-structured data, high write throughput, geographical distribution, and schema evolution. Applications handling user-generated content, real-time analytics, session management, and social network graphs often benefit from NoSQL solutions. The database choice depends on data access patterns, consistency requirements, and scaling characteristics rather than universal superiority of one approach over another.
Key Principles
NoSQL databases organize around four fundamental data models, each optimized for specific access patterns and data relationships. Understanding these models guides appropriate database selection for particular use cases.
Document stores organize data as self-contained documents, typically JSON or BSON structures. Each document contains fields and values, with nested structures representing complex relationships. Document databases support queries against any field within documents, making them suitable for content management, user profiles, and product catalogs. MongoDB and CouchDB exemplify this category.
Key-value stores provide the simplest NoSQL model, mapping unique keys to values. The database treats values as opaque blobs, performing no operations on value contents. This model delivers exceptional read and write performance for cache layers, session stores, and user preferences. Redis and DynamoDB operate primarily as key-value stores, though they offer additional data structures.
Column-family stores group data into column families rather than rows, optimizing for read and write performance on specific column sets. This structure suits time-series data, analytics workloads, and scenarios requiring high write throughput across many columns. Cassandra and HBase implement column-family models, handling massive datasets across distributed clusters.
Graph databases model data as nodes and edges, representing entities and their relationships explicitly. Query languages traverse these relationships efficiently, making graph databases optimal for social networks, recommendation engines, and fraud detection. Neo4j and ArangoDB focus on graph operations, though some multi-model databases support graph queries alongside other paradigms.
NoSQL databases typically prioritize availability and partition tolerance over consistency, as described by the CAP theorem. Most NoSQL systems offer tunable consistency, allowing applications to choose between strong consistency, eventual consistency, or intermediate levels based on requirements. Write operations may return before replication completes, accepting temporary inconsistency for improved performance.
Schema flexibility distinguishes NoSQL databases from relational systems. Applications can add fields to documents, modify data structures, or store heterogeneous records within the same collection without schema migrations. This flexibility accelerates development when requirements evolve but requires application-level schema management and validation.
Horizontal scaling forms a core principle of NoSQL architecture. Systems distribute data across multiple nodes using sharding or partitioning strategies, adding capacity by including additional servers rather than upgrading existing hardware. Consistent hashing, range partitioning, and hash partitioning determine data distribution across nodes.
Design Considerations
Selecting between SQL and NoSQL databases requires analyzing data structure, access patterns, consistency requirements, and scaling needs. No single database type suits all scenarios, and many applications benefit from polyglot persistence, using multiple database types for different components.
Data structure analysis determines whether data fits naturally into tables with fixed schemas or requires flexible document structures. Relational databases excel when data exhibits clear relationships, referential integrity matters, and queries require complex joins across multiple tables. NoSQL databases suit scenarios with evolving schemas, nested data structures, or sparse attributes where many fields remain unpopulated for most records.
Applications requiring strict transactional guarantees across multiple records favor SQL databases. Banking systems, inventory management, and financial applications demand ACID properties ensuring data integrity. NoSQL databases typically support transactions within single documents or rows but provide limited multi-document transaction support. Some modern NoSQL databases add transaction capabilities, but this reduces the performance advantages that motivated NoSQL adoption.
Read and write patterns significantly influence database selection. Applications with read-heavy workloads, complex queries, and ad-hoc reporting needs benefit from relational databases and their mature query optimization. Write-heavy applications, particularly those ingesting time-series data, logs, or sensor readings, benefit from NoSQL databases optimized for sequential writes. Column-family stores like Cassandra handle millions of writes per second across distributed clusters.
Query complexity determines which database model handles application needs efficiently. Relational databases execute complex joins, aggregations, and subqueries through SQL. Graph databases traverse relationships efficiently but perform poorly on tabular operations. Document databases query nested structures naturally but require application-level joins for cross-document relationships. Key-value stores offer no query capabilities beyond key lookups, requiring secondary indexes or denormalization for other access patterns.
Scaling requirements affect database choice fundamentally. Vertical scaling, adding resources to a single server, suits many SQL database deployments. Horizontal scaling, distributing data across multiple servers, aligns with NoSQL architecture. Applications anticipating growth beyond single-server capacity should consider NoSQL databases designed for distributed operation. Geographic distribution requirements favor NoSQL databases supporting multi-datacenter replication with configurable consistency.
Consistency requirements determine whether eventual consistency suffices or strong consistency proves necessary. Social media feeds, product catalogs, and content management tolerate eventual consistency, where different users might see slightly stale data temporarily. Financial transactions, inventory counts, and seat reservations require strong consistency to prevent conflicts and maintain data integrity.
Development velocity considerations include the learning curve, available tooling, and operational complexity. SQL databases offer familiar concepts, standardized query languages, and mature ecosystems. NoSQL databases require understanding new data models and often provide database-specific query languages or APIs. Teams experienced with relational modeling may encounter challenges adapting to document or graph thinking.
Data migration and schema evolution present different challenges across database types. Relational databases require careful migration planning, potentially locking tables during schema changes. NoSQL databases allow gradual schema evolution but require application code handling multiple document versions simultaneously. The absence of enforced schemas transfers validation responsibility to application logic.
Implementation Approaches
Implementing NoSQL solutions requires considering data modeling strategies, access pattern optimization, and deployment architecture. Different approaches suit various application characteristics and organizational capabilities.
Single-database approach uses one NoSQL database type for the entire application, simplifying operations and reducing infrastructure complexity. This approach works when application requirements align well with a specific NoSQL model. Document databases often serve as general-purpose stores, handling various data types within flexible JSON structures. MongoDB deployments typically follow this pattern, storing user profiles, application state, and business entities in different collections within a single cluster.
Polyglot persistence combines multiple database types, selecting optimal storage for each data domain. A typical e-commerce application might use PostgreSQL for transactions and inventory, Redis for session storage and caching, Elasticsearch for product search, and Neo4j for recommendations. This approach maximizes performance and scalability but increases operational complexity and data synchronization challenges.
Caching strategies integrate NoSQL databases with existing relational systems, using key-value stores to reduce database load. Applications store frequently accessed data in Redis or Memcached, falling back to SQL databases for cache misses. This pattern improves read performance without replacing primary data stores. Cache invalidation strategies determine consistency guarantees, with time-based expiration, write-through caching, or explicit invalidation on updates.
Event sourcing and CQRS patterns separate write and read models, using NoSQL databases for event storage or read-optimized views. Event stores capture all state changes as immutable events, often using document or column-family databases. Read models materialize from event streams into structures optimized for queries, frequently using different database types for command and query sides. This separation enables independent scaling and optimization of read and query paths.
Data modeling for NoSQL databases differs fundamentally from relational normalization. Document databases favor denormalization, embedding related data within documents to avoid joins. This approach trades storage efficiency for query performance and data locality. Applications duplicate data across documents when multiple access patterns require the same information, accepting update complexity for read optimization.
Reference patterns in document databases store related document identifiers rather than embedding full documents. This approach resembles foreign keys but requires application-level joins. Applications must decide between embedding and references based on data size, update frequency, and query patterns. Embedded documents work well for one-to-few relationships with infrequently updated data. References suit one-to-many or many-to-many relationships where referenced data updates frequently or appears in multiple contexts.
Time-series data modeling in column-family or document databases optimizes for sequential writes and time-range queries. Data partitioning by time period, typically day or hour, enables efficient queries and data expiration. Row keys incorporate timestamps, distributing writes across nodes while maintaining query locality. Applications collecting metrics, logs, or sensor data benefit from time-series optimizations in Cassandra or TimescaleDB.
Graph data modeling identifies entities as nodes and relationships as edges, often adding properties to both. Graph databases perform relationship traversals efficiently, finding patterns across multiple hops. Social networks model users as nodes with friendship edges, enabling friend-of-friend queries and community detection. Fraud detection systems model transactions and accounts as graphs, identifying suspicious patterns through relationship analysis.
Ruby Implementation
Ruby applications integrate NoSQL databases through client libraries providing idiomatic interfaces. Each database type offers Ruby gems wrapping native protocols or HTTP APIs. Understanding connection management, query construction, and error handling enables effective NoSQL integration.
MongoDB integration uses the official mongodb gem or higher-level Mongoid ODM. The mongodb driver provides low-level collection operations, while Mongoid adds ActiveRecord-like models and associations. Connection configuration specifies cluster hosts, authentication credentials, and connection pooling parameters.
require 'mongo'
client = Mongo::Client.new(
['localhost:27017'],
database: 'application_db',
max_pool_size: 10,
connect_timeout: 5
)
users = client[:users]
# Insert document
result = users.insert_one({
username: 'alice',
email: 'alice@example.com',
profile: {
age: 28,
interests: ['ruby', 'databases']
},
created_at: Time.now
})
# => BSON::ObjectId
# Query with projection and sort
user = users.find(
{ username: 'alice' },
projection: { password: 0 }
).sort(created_at: -1).first
# => {"_id"=>BSON::ObjectId, "username"=>"alice", ...}
Mongoid adds object mapping with embedded documents and associations:
class User
include Mongoid::Document
include Mongoid::Timestamps
field :username, type: String
field :email, type: String
embeds_one :profile
has_many :posts
validates :username, uniqueness: true, presence: true
validates :email, format: { with: URI::MailTo::EMAIL_REGEXP }
end
class Profile
include Mongoid::Document
field :age, type: Integer
field :interests, type: Array
embedded_in :user
end
user = User.create!(
username: 'bob',
email: 'bob@example.com',
profile: Profile.new(age: 30, interests: ['programming'])
)
# => #<User _id: ..., username: "bob", ...>
Redis integration uses the redis gem for key-value operations, supporting strings, hashes, lists, sets, and sorted sets. Connection pooling through connection_pool gem prevents connection exhaustion in multi-threaded applications.
require 'redis'
require 'connection_pool'
redis_pool = ConnectionPool.new(size: 5, timeout: 3) do
Redis.new(
host: 'localhost',
port: 6379,
db: 0,
timeout: 2
)
end
redis_pool.with do |redis|
# Set with expiration
redis.setex('session:12345', 3600, {
user_id: 42,
ip: '192.168.1.1'
}.to_json)
# Hash operations
redis.hset('user:42', 'name', 'Charlie')
redis.hset('user:42', 'email', 'charlie@example.com')
redis.hgetall('user:42')
# => {"name"=>"Charlie", "email"=>"charlie@example.com"}
# Sorted set for leaderboard
redis.zadd('scores', 100, 'player1')
redis.zadd('scores', 150, 'player2')
redis.zrevrange('scores', 0, 9, with_scores: true)
# => [["player2", 150.0], ["player1", 100.0]]
end
Cassandra integration through the cassandra-driver gem handles CQL queries and connection management. The driver manages node discovery, load balancing, and retry policies automatically.
require 'cassandra'
cluster = Cassandra.cluster(
hosts: ['127.0.0.1'],
consistency: :quorum,
timeout: 10
)
session = cluster.connect('analytics')
# Create table
session.execute(<<~CQL)
CREATE TABLE IF NOT EXISTS events (
user_id uuid,
event_time timestamp,
event_type text,
data map<text, text>,
PRIMARY KEY (user_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC)
CQL
# Prepared statement for inserts
insert_stmt = session.prepare(<<~CQL)
INSERT INTO events (user_id, event_time, event_type, data)
VALUES (?, ?, ?, ?)
CQL
# Batch insert
batch = session.batch do |b|
b.add(insert_stmt, arguments: [
Cassandra::Uuid.new('123e4567-e89b-12d3-a456-426614174000'),
Time.now,
'page_view',
{'page' => '/home', 'duration' => '5s'}
])
end
session.execute(batch)
# Query with time range
rows = session.execute(
'SELECT * FROM events WHERE user_id = ? AND event_time >= ?',
arguments: [user_uuid, 1.hour.ago]
)
rows.each do |row|
puts "#{row['event_type']}: #{row['data']}"
end
DynamoDB integration uses the aws-sdk-dynamodb gem, requiring AWS credentials configuration. The SDK handles request signing, retries, and pagination automatically.
require 'aws-sdk-dynamodb'
dynamodb = Aws::DynamoDB::Client.new(
region: 'us-east-1',
credentials: Aws::Credentials.new(access_key_id, secret_key)
)
# Put item
dynamodb.put_item(
table_name: 'Users',
item: {
'UserId' => '12345',
'Username' => 'diana',
'Email' => 'diana@example.com',
'LoginCount' => 42,
'LastLogin' => Time.now.iso8601
}
)
# Query with key condition
result = dynamodb.query(
table_name: 'Orders',
key_condition_expression: 'CustomerId = :id AND OrderDate > :date',
expression_attribute_values: {
':id' => 'CUST123',
':date' => 30.days.ago.iso8601
}
)
result.items.each do |item|
puts "Order #{item['OrderId']}: $#{item['Total']}"
end
# Update with conditional expression
dynamodb.update_item(
table_name: 'Products',
key: { 'ProductId' => 'PROD456' },
update_expression: 'SET Stock = Stock - :qty',
condition_expression: 'Stock >= :qty',
expression_attribute_values: { ':qty' => 5 }
)
Neo4j integration through the neo4j-ruby-driver gem executes Cypher queries against graph databases. The driver manages session pooling and transaction lifecycle.
require 'neo4j-ruby-driver'
driver = Neo4j::Driver::GraphDatabase.driver(
'bolt://localhost:7687',
Neo4j::Driver::AuthTokens.basic('neo4j', 'password')
)
session = driver.session(database: 'social')
# Create nodes and relationships
session.write_transaction do |tx|
tx.run(<<~CYPHER, name: 'Eve', age: 25)
CREATE (u:User {name: $name, age: $age})
RETURN u
CYPHER
tx.run(<<~CYPHER)
MATCH (a:User {name: 'Eve'})
MATCH (b:User {name: 'Frank'})
CREATE (a)-[:FOLLOWS]->(b)
CYPHER
end
# Query relationships
result = session.read_transaction do |tx|
tx.run(<<~CYPHER, name: 'Eve', depth: 2)
MATCH (u:User {name: $name})-[:FOLLOWS*1..$depth]->(friend)
RETURN DISTINCT friend.name AS name, friend.age AS age
CYPHER
end
result.each do |record|
puts "#{record['name']}, age #{record['age']}"
end
session.close
driver.close
Practical Examples
Real-world NoSQL implementations demonstrate how database selection and data modeling address specific application requirements. These examples show complete scenarios from schema design through query implementation.
Session storage with Redis handles user session data requiring fast reads, automatic expiration, and high availability. Web applications store session state in Redis rather than relational databases to reduce latency and avoid locking issues. The implementation uses hash data structures for session attributes and key expiration for automatic cleanup.
class SessionStore
def initialize(redis_pool)
@redis_pool = redis_pool
end
def create_session(user_id, session_data)
session_id = SecureRandom.uuid
session_key = "session:#{session_id}"
@redis_pool.with do |redis|
redis.multi do |transaction|
transaction.hset(session_key, 'user_id', user_id)
transaction.hset(session_key, 'created_at', Time.now.to_i)
session_data.each do |key, value|
transaction.hset(session_key, key.to_s, value.to_json)
end
transaction.expire(session_key, 86400) # 24 hours
# Add to user's active sessions
transaction.sadd("user:#{user_id}:sessions", session_id)
end
end
session_id
end
def get_session(session_id)
session_key = "session:#{session_id}"
@redis_pool.with do |redis|
data = redis.hgetall(session_key)
return nil if data.empty?
redis.expire(session_key, 86400) # Refresh expiration
data.transform_values do |value|
JSON.parse(value) rescue value
end
end
end
def invalidate_user_sessions(user_id)
@redis_pool.with do |redis|
session_ids = redis.smembers("user:#{user_id}:sessions")
return if session_ids.empty?
redis.pipelined do |pipeline|
session_ids.each do |session_id|
pipeline.del("session:#{session_id}")
end
pipeline.del("user:#{user_id}:sessions")
end
end
end
end
Product catalog with MongoDB stores product information with varying attributes across categories. Electronics have specifications, clothing has sizes and colors, and books have ISBNs and authors. Document databases handle heterogeneous product schemas without requiring sparse columns or entity-attribute-value patterns.
class Product
include Mongoid::Document
include Mongoid::Timestamps
field :sku, type: String
field :name, type: String
field :category, type: String
field :price, type: Float
field :inventory, type: Integer
field :attributes, type: Hash
field :tags, type: Array
index({ category: 1, price: 1 })
index({ tags: 1 })
index({ sku: 1 }, { unique: true })
validates :sku, uniqueness: true, presence: true
validates :price, numericality: { greater_than: 0 }
scope :in_stock, -> { where(:inventory.gt => 0) }
scope :by_category, ->(cat) { where(category: cat) }
scope :price_range, ->(min, max) { where(:price.gte => min, :price.lte => max) }
end
# Create products with different attributes
laptop = Product.create!(
sku: 'LAPTOP-001',
name: 'Professional Laptop',
category: 'electronics',
price: 1299.99,
inventory: 15,
attributes: {
processor: 'Intel i7',
ram: '16GB',
storage: '512GB SSD',
screen: '15.6 inch'
},
tags: ['laptop', 'portable', 'professional']
)
book = Product.create!(
sku: 'BOOK-042',
name: 'Ruby Programming Guide',
category: 'books',
price: 49.99,
inventory: 100,
attributes: {
author: 'John Developer',
isbn: '978-1234567890',
pages: 450,
publisher: 'Tech Press'
},
tags: ['programming', 'ruby', 'reference']
)
# Query with complex criteria
results = Product.in_stock
.by_category('electronics')
.price_range(500, 2000)
.where(tags: 'laptop')
.only(:name, :price, :inventory)
results.each do |product|
puts "#{product.name}: $#{product.price} (#{product.inventory} available)"
end
Time-series analytics with Cassandra ingests and queries sensor data from IoT devices. The schema partitions data by device and time period, supporting efficient writes and range queries. Data modeling uses clustering columns for time-based ordering within partitions.
class SensorDataStore
def initialize(session)
@session = session
setup_schema
end
def setup_schema
@session.execute(<<~CQL)
CREATE TABLE IF NOT EXISTS sensor_readings (
device_id uuid,
date date,
reading_time timestamp,
temperature decimal,
humidity decimal,
battery_level int,
PRIMARY KEY ((device_id, date), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC)
CQL
@insert_stmt = @session.prepare(<<~CQL)
INSERT INTO sensor_readings (
device_id, date, reading_time,
temperature, humidity, battery_level
) VALUES (?, ?, ?, ?, ?, ?)
USING TTL 7776000
CQL
end
def record_reading(device_id, temperature, humidity, battery)
timestamp = Time.now
@session.execute(@insert_stmt, arguments: [
device_id,
timestamp.to_date,
timestamp,
BigDecimal(temperature.to_s),
BigDecimal(humidity.to_s),
battery
])
end
def batch_insert(readings)
batch = @session.batch do |b|
readings.each do |reading|
timestamp = reading[:timestamp]
b.add(@insert_stmt, arguments: [
reading[:device_id],
timestamp.to_date,
timestamp,
BigDecimal(reading[:temperature].to_s),
BigDecimal(reading[:humidity].to_s),
reading[:battery_level]
])
end
end
@session.execute(batch)
end
def query_device_range(device_id, start_time, end_time)
dates = (start_time.to_date..end_time.to_date).to_a
results = []
dates.each do |date|
rows = @session.execute(
<<~CQL,
SELECT * FROM sensor_readings
WHERE device_id = ?
AND date = ?
AND reading_time >= ?
AND reading_time <= ?
CQL
arguments: [device_id, date, start_time, end_time]
)
results.concat(rows.to_a)
end
results.sort_by { |r| r['reading_time'] }
end
def calculate_averages(device_id, date)
rows = @session.execute(
'SELECT temperature, humidity FROM sensor_readings WHERE device_id = ? AND date = ?',
arguments: [device_id, date]
)
temps = rows.map { |r| r['temperature'].to_f }
humids = rows.map { |r| r['humidity'].to_f }
{
avg_temperature: temps.sum / temps.size,
avg_humidity: humids.sum / humids.size,
reading_count: temps.size
}
end
end
Social graph with Neo4j models user connections and recommendation algorithms. Graph databases traverse relationships efficiently, finding friends-of-friends, mutual connections, and content recommendations based on network patterns.
class SocialGraph
def initialize(driver)
@driver = driver
end
def add_user(username, profile_data)
session = @driver.session
result = session.write_transaction do |tx|
tx.run(<<~CYPHER, username: username, profile: profile_data)
CREATE (u:User {
username: $username,
joined_at: datetime(),
profile: $profile
})
RETURN u
CYPHER
end
result.single['u']
ensure
session.close
end
def create_friendship(user1, user2)
session = @driver.session
session.write_transaction do |tx|
tx.run(<<~CYPHER, u1: user1, u2: user2)
MATCH (a:User {username: $u1})
MATCH (b:User {username: $u2})
CREATE (a)-[:FRIENDS_WITH {since: datetime()}]->(b)
CREATE (b)-[:FRIENDS_WITH {since: datetime()}]->(a)
CYPHER
end
ensure
session.close
end
def find_mutual_friends(user1, user2)
session = @driver.session
result = session.read_transaction do |tx|
tx.run(<<~CYPHER, u1: user1, u2: user2)
MATCH (a:User {username: $u1})-[:FRIENDS_WITH]->(mutual)<-[:FRIENDS_WITH]-(b:User {username: $u2})
RETURN mutual.username AS username, mutual.profile AS profile
CYPHER
end
result.map { |record| { username: record['username'], profile: record['profile'] } }
ensure
session.close
end
def suggest_friends(username, limit = 5)
session = @driver.session
result = session.read_transaction do |tx|
tx.run(<<~CYPHER, username: username, limit: limit)
MATCH (user:User {username: $username})-[:FRIENDS_WITH]->()-[:FRIENDS_WITH]->(suggestion)
WHERE NOT (user)-[:FRIENDS_WITH]->(suggestion) AND user <> suggestion
WITH suggestion, COUNT(*) AS mutual_friends
ORDER BY mutual_friends DESC
LIMIT $limit
RETURN suggestion.username AS username, mutual_friends
CYPHER
end
result.map do |record|
{ username: record['username'], mutual_count: record['mutual_friends'] }
end
ensure
session.close
end
def shortest_path(from_user, to_user)
session = @driver.session
result = session.read_transaction do |tx|
tx.run(<<~CYPHER, from: from_user, to: to_user)
MATCH path = shortestPath(
(a:User {username: $from})-[:FRIENDS_WITH*]-(b:User {username: $to})
)
RETURN [node IN nodes(path) | node.username] AS path_usernames,
length(path) AS degrees_of_separation
CYPHER
end
return nil if result.to_a.empty?
record = result.single
{ path: record['path_usernames'], distance: record['degrees_of_separation'] }
ensure
session.close
end
end
Tools & Ecosystem
NoSQL databases provide diverse implementations, each optimized for specific use cases. Ruby developers integrate these databases through client libraries and frameworks providing idiomatic interfaces.
Document Databases include MongoDB, CouchDB, and Couchbase. MongoDB dominates Ruby adoption with the official mongodb gem and Mongoid ODM providing ActiveRecord-style models. CouchDB uses HTTP APIs, integrating through the couchrest gem. Couchbase combines document storage with caching capabilities through the couchbase gem. AWS DocumentDB provides MongoDB-compatible managed service.
Key-Value Stores include Redis, Memcached, and DynamoDB. Redis supports complex data structures beyond simple strings, making it suitable for caching, queues, and real-time leaderboards. The redis gem handles connection pooling and pipelining. Memcached focuses purely on caching through the dalli gem. DynamoDB provides managed key-value and document storage with the aws-sdk-dynamodb gem.
Column-Family Stores include Cassandra, HBase, and ScyllaDB. Cassandra handles time-series data and write-heavy workloads with the cassandra-driver gem. HBase integrates with Hadoop ecosystems through the hbase gem, though Ruby usage remains less common than Java. ScyllaDB offers Cassandra-compatible API with improved performance.
Graph Databases include Neo4j, ArangoDB, and Amazon Neptune. Neo4j leads graph database adoption with the neo4j-ruby-driver gem and activegraph ODM. ArangoDB supports multi-model storage including documents, key-values, and graphs through the arangodb gem. Neptune provides managed graph database compatible with Gremlin and SPARQL query languages.
Multi-Model Databases combine multiple data models in single systems. ArangoDB supports documents, graphs, and key-values with unified query language. OrientDB provides document and graph capabilities. Couchbase combines document storage with key-value access patterns.
Ruby ORMs and ODMs abstract database operations behind ActiveRecord-like interfaces. Mongoid provides MongoDB integration matching Rails conventions, handling associations, validations, and callbacks. ROM (Ruby Object Mapper) supports multiple databases including SQL and NoSQL through adapters, favoring explicit data access patterns over ActiveRecord magic.
Caching gems integrate NoSQL databases with Rails and Sinatra applications. The Rails cache API supports Redis, Memcached, and file-based backends through configured adapters. Redis-rails gem integrates Redis for session storage, caching, and ActionCable subscriptions. Readthis gem provides alternative Redis caching implementation with better serialization performance.
Background job processing often relies on Redis for queue storage. Sidekiq uses Redis for job persistence and scheduling, handling millions of jobs efficiently. Resque provides alternative Redis-backed queue implementation. Both gems integrate naturally with Rails applications through ActiveJob adapter interface.
Database administration tools include Studio 3T for MongoDB, RedisInsight for Redis, and Neo4j Browser for graph exploration. Cassandra uses cqlsh command-line interface and DataStax DevCenter GUI. Many databases provide web-based admin interfaces accessible through browser.
Monitoring and observability tools track NoSQL performance and health. MongoDB Atlas provides cloud-based monitoring and alerts. Redis monitoring through RedisInsight or Prometheus exporters tracks memory usage and command latency. Cassandra monitoring uses DataStax OpsCenter or Prometheus metrics.
Reference
NoSQL Database Type Selection
| Database Type | Primary Use Cases | Ruby Gems | Scaling Model |
|---|---|---|---|
| Document | Content management, catalogs, user profiles | mongodb, mongoid, couchrest | Horizontal with sharding |
| Key-Value | Caching, sessions, real-time data | redis, connection_pool, dalli | Horizontal with consistent hashing |
| Column-Family | Time-series, analytics, write-heavy loads | cassandra-driver | Horizontal with partitioning |
| Graph | Social networks, recommendations, fraud detection | neo4j-ruby-driver, activegraph | Vertical primarily |
Consistency Levels
| Level | Description | Use Case |
|---|---|---|
| Strong | All reads return most recent write | Financial transactions, inventory |
| Eventual | Reads may return stale data temporarily | Social feeds, content delivery |
| Causal | Related operations maintain ordering | Messaging, comment threads |
| Session | Consistency within client session | User dashboards, personalization |
Data Modeling Patterns
| Pattern | Description | Database Types |
|---|---|---|
| Embedding | Nest related data within documents | Document stores |
| Referencing | Store identifiers to related data | Document, key-value |
| Denormalization | Duplicate data for query performance | All types |
| Bucketing | Group time-series data into periods | Column-family |
| Composite Keys | Combine fields for partitioning | Column-family |
| Adjacency List | Store direct relationships | Graph, document |
MongoDB Operations
| Operation | Method | Description |
|---|---|---|
| Insert | insert_one, insert_many | Add documents to collection |
| Find | find, find_one | Query documents with filters |
| Update | update_one, update_many | Modify existing documents |
| Replace | replace_one | Replace entire document |
| Delete | delete_one, delete_many | Remove documents |
| Aggregate | aggregate | Process data pipeline operations |
Redis Data Structures
| Structure | Commands | Use Case |
|---|---|---|
| String | GET, SET, INCR | Simple values, counters |
| Hash | HGET, HSET, HGETALL | Objects, user profiles |
| List | LPUSH, RPUSH, LRANGE | Queues, activity feeds |
| Set | SADD, SMEMBERS, SINTER | Unique items, tags |
| Sorted Set | ZADD, ZRANGE, ZRANK | Leaderboards, rankings |
| Stream | XADD, XREAD, XGROUP | Event streams, logs |
Cassandra Query Patterns
| Pattern | CQL Example | Performance |
|---|---|---|
| Partition Key Lookup | WHERE partition_key = ? | Excellent |
| Partition Range Scan | WHERE partition_key = ? AND clustering_col > ? | Good |
| Multi-Partition Query | WHERE partition_key IN (?, ?) | Moderate |
| Secondary Index | WHERE indexed_column = ? | Poor |
| Full Table Scan | SELECT without WHERE | Very Poor |
CAP Theorem Trade-offs
| Database | Consistency | Availability | Partition Tolerance |
|---|---|---|---|
| MongoDB | Tunable | High | High |
| Redis | Strong | High | Limited |
| Cassandra | Tunable | Very High | Very High |
| Neo4j | Strong | Moderate | Limited |
| DynamoDB | Tunable | Very High | Very High |
Common Query Patterns
| Pattern | SQL Alternative | NoSQL Implementation |
|---|---|---|
| Key Lookup | SELECT WHERE id = ? | get, find_one with _id |
| Range Query | SELECT WHERE date BETWEEN ? AND ? | find with range operators |
| Text Search | SELECT WHERE text LIKE ? | text indexes, Elasticsearch |
| Aggregation | GROUP BY, SUM, AVG | aggregation pipeline |
| Join | SELECT FROM table1 JOIN table2 | embed, reference, application join |
| Transaction | BEGIN TRANSACTION | single-document atomicity |