Overview
Graph databases store and query data based on relationships between entities. Unlike relational databases that use tables with foreign keys to represent connections, graph databases treat relationships as first-class citizens with direct pointers between nodes. This structure eliminates join operations and enables efficient traversal of connected data.
The graph data model consists of nodes (entities), edges (relationships), and properties (attributes). A social network represents users as nodes with "follows" edges connecting them. An e-commerce system models products, categories, and customers as nodes with "belongs_to", "purchased", and "recommends" relationships. The database stores these connections as direct references rather than requiring index lookups to reconstruct relationships.
Graph databases excel at queries that traverse multiple relationship levels. Finding friends-of-friends in a social network requires a simple two-hop traversal rather than multiple self-joins. Recommendation systems query "users who bought this also bought" patterns through direct relationship following. Fraud detection identifies suspicious transaction patterns by analyzing networks of accounts, transfers, and merchants.
The approach differs fundamentally from relational modeling. Relational databases normalize data into separate tables and reconstruct relationships through foreign key joins. Each join operation requires index scanning and matching operations that become expensive with multiple relationship levels. Graph databases navigate from one node to connected nodes through direct memory addresses or pointers, maintaining constant-time relationship traversal regardless of database size.
# Relational approach - multiple joins required
User.joins(:friendships)
.joins("JOIN friendships f2 ON friendships.friend_id = f2.user_id")
.joins("JOIN users u2 ON f2.friend_id = u2.id")
.where(users: { id: current_user.id })
# Graph approach - direct traversal
user.friends.flat_map(&:friends)
Key Principles
Graph databases organize data around three fundamental components: nodes, edges, and properties. Nodes represent entities such as people, products, or locations. Edges encode relationships between nodes with directional connections. Properties attach attributes to both nodes and edges as key-value pairs.
Nodes function as vertices in the graph structure. Each node contains properties describing the entity and maintains references to connected edges. Node types or labels categorize entities into classes like User, Product, or Order. The database assigns each node a unique identifier for direct access and relationship targeting.
Edges establish directed connections between nodes. A "FOLLOWS" edge points from one user to another. A "PURCHASED" edge connects a customer to a product with a timestamp property. Edge directionality matters for queries - traversing incoming versus outgoing edges produces different result sets. Edges themselves carry properties storing relationship metadata such as weight, date, or status.
Properties store attributes as name-value pairs on nodes and edges. A User node contains properties for name, email, and registration_date. A PURCHASED edge includes properties for quantity, price, and order_date. The schema remains flexible - different nodes of the same type may have different property sets.
The property graph model combines these elements into a labeled, directed, attributed multigraph. Nodes and edges carry labels for type classification. Multiple edges can connect the same node pair with different relationship types. Properties provide rich metadata throughout the structure.
Index-free adjacency defines a core architectural principle. Each node maintains direct references to adjacent nodes and connecting edges. Traversing from one node to connected nodes requires no index lookup - the database follows pointers directly. This design enables constant-time relationship traversal. The time to find connected nodes remains identical whether the database contains thousands or billions of nodes.
Graph traversal algorithms navigate the structure through depth-first or breadth-first patterns. Queries specify starting nodes and traversal patterns to find paths, subgraphs, or aggregated results. Pattern matching identifies specific graph shapes such as triangles, chains, or hub-and-spoke configurations.
# Node with properties
user = {
id: 'user:123',
labels: ['User', 'Customer'],
properties: {
name: 'Alice',
email: 'alice@example.com',
created_at: Time.now
}
}
# Edge with properties
friendship = {
id: 'rel:456',
type: 'FOLLOWS',
from: 'user:123',
to: 'user:789',
properties: {
since: Date.new(2020, 1, 15),
strength: 0.85
}
}
Cypher query language provides declarative pattern matching for graph queries. The syntax uses ASCII art to represent graph patterns. Parentheses denote nodes, arrows represent relationships, and brackets contain properties. A query matches patterns against the stored graph and returns matching subgraphs.
MATCH (user:User)-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof)
WHERE user.name = 'Alice'
RETURN fof.name
Graph databases support ACID transactions for consistency guarantees. Read and write operations execute within transaction boundaries. The database maintains referential integrity for relationships - deleting a node requires handling connected edges.
Design Considerations
Graph databases solve problems where relationships form the core query pattern. Choosing between graph and relational databases depends on data connectivity, query patterns, and schema flexibility requirements.
Relationship-heavy workloads benefit from graph structure. Social networks query friend connections, followers, and influence paths. Recommendation engines analyze purchase patterns and product similarities. Knowledge graphs link concepts through semantic relationships. These scenarios involve multi-hop traversals that become inefficient with relational joins.
Relational databases perform well for entity-centric queries. Retrieving user profiles by ID, filtering products by category, or aggregating sales by region execute efficiently with indexed lookups. When queries primarily access individual records or perform set-based filtering without relationship traversal, relational models remain optimal.
Query complexity indicates database selection. Graph databases excel at variable-length path queries. Finding all reachable nodes within N hops, shortest paths between entities, or circular relationship patterns require simple graph traversals. The equivalent relational queries involve recursive CTEs or multiple self-joins that degrade performance exponentially with path length.
Relational databases handle fixed-relationship queries efficiently. Joining three tables with known foreign keys performs well with proper indexing. When relationship depth remains constant and known in advance, relational optimization techniques apply effectively.
Schema evolution differs between approaches. Graph databases support flexible schemas where nodes of the same type carry different property sets. Adding new relationship types or node properties requires no schema migration. This flexibility accommodates evolving data models and semi-structured data.
Relational schemas enforce structure through table definitions and constraints. Schema changes require ALTER TABLE operations that lock tables during migration. Strict typing and normalization provide data consistency guarantees but reduce flexibility.
Write patterns influence database choice. Graph databases handle frequent relationship modifications efficiently. Adding followers, creating product associations, or linking related content requires simple edge creation. The database updates adjacency lists without reindexing large tables.
Relational databases optimize for bulk inserts and transactional consistency. Loading batches of records with foreign keys performs well. Frequent relationship changes require updating join tables and maintaining referential integrity across tables.
Data volume characteristics affect performance differently. Graph databases maintain constant-time traversal regardless of database size due to index-free adjacency. A query traversing three relationship levels takes the same time whether the database contains millions or billions of nodes.
Relational database join performance degrades with table size. Even with indexes, joining large tables requires scanning more entries. Deep relationship queries compound this effect across multiple join operations.
Analytical queries distinguish database types. Relational databases excel at aggregations, grouping, and set operations over large datasets. SQL engines optimize for full table scans, parallel processing, and complex aggregate functions.
Graph databases optimize for traversal-based analytics. Computing centrality measures, finding communities, or analyzing network properties leverage graph algorithms. However, calculating global aggregates or performing fact table roll-ups remains less efficient than specialized analytical databases.
# Graph traversal - constant time per hop
def mutual_friends(user_id, friend_id)
graph.query(
"MATCH (u:User {id: $user_id})-[:FRIEND]-(mutual)-[:FRIEND]-(f:User {id: $friend_id})",
user_id: user_id, friend_id: friend_id
)
end
# Relational - joins grow with data size
def mutual_friends(user_id, friend_id)
Friendship.where(user_id: user_id)
.joins("INNER JOIN friendships f2 ON friendships.friend_id = f2.user_id")
.where("f2.friend_id = ?", friend_id)
end
Hybrid architectures combine database types. Store core entity data in relational tables with proper normalization. Replicate relationship structures to a graph database for traversal queries. This pattern maintains transactional consistency for business data while enabling efficient graph analytics.
Polyglot persistence acknowledges that different data models serve different query patterns. User profiles and transactions reside in relational storage. Social connections and recommendations use graph structure. Product catalogs leverage document stores. Applications query the appropriate database for each use case.
Implementation Approaches
Implementing graph databases requires decisions about data modeling, query optimization, and integration patterns. The approach depends on whether using a dedicated graph database or adding graph capabilities to existing infrastructure.
Native graph storage uses specialized data structures optimized for graph operations. The database stores nodes and edges with direct adjacency pointers. Each node contains a list of incoming and outgoing edge identifiers. Following relationships requires dereferencing these pointers without index lookups.
This approach provides optimal traversal performance. The database allocates memory regions for node and edge storage with predictable layout. Accessing adjacent nodes reads from adjacent memory locations, improving cache utilization. Write operations append to adjacency lists without restructuring indexes.
Non-native graph storage implements graph abstractions over relational or key-value stores. A relational implementation creates tables for nodes and edges with foreign keys. A key-value approach stores adjacency lists as serialized structures. These implementations trade traversal performance for operational simplicity.
Non-native implementations integrate with existing database infrastructure. Organizations leverage familiar backup, replication, and monitoring tools. However, traversal operations require multiple database queries or index scans rather than direct pointer following.
Data modeling strategies transform domain concepts into graph structures. The process identifies entities that become nodes and relationships that become edges. Properties map to attributes on nodes and edges.
Modeling users and friendships creates User nodes with FRIEND edges. Product catalogs model products, categories, and tags as nodes with BELONGS_TO and TAGGED_WITH relationships. Organizational hierarchies represent employees and departments with WORKS_IN and MANAGES edges.
# Domain model
class User
has_many :friendships
has_many :friends, through: :friendships
end
# Graph model
{
nodes: [
{id: 'u1', label: 'User', properties: {name: 'Alice'}},
{id: 'u2', label: 'User', properties: {name: 'Bob'}}
],
edges: [
{from: 'u1', to: 'u2', type: 'FRIEND', properties: {since: '2020-01-01'}}
]
}
Denormalization decisions balance query performance against storage overhead. Relational databases normalize data to eliminate redundancy. Graph databases often denormalize properties onto edges for query efficiency.
Storing user names on FRIEND edges avoids node lookups when displaying friend lists. Duplicating product prices on PURCHASED edges provides historical accuracy without joining to product nodes. This pattern increases storage but reduces query complexity.
Relationship directionality impacts query patterns. Bidirectional relationships require two edges - one in each direction. Queries traverse outgoing edges from the starting node. Creating reverse edges doubles storage but simplifies traversal logic.
Alternatively, store single directed edges and query in both directions. The database supports traversing edges against their direction. This approach reduces storage but requires the query engine to check both incoming and outgoing edges.
Schema design patterns organize graph structure. The star pattern creates a central hub node with spokes to related entities. A user node connects to posts, comments, and likes. Queries start at the hub and traverse outward.
The hierarchy pattern represents tree structures with parent-child relationships. Department nodes connect through REPORTS_TO edges. Traversing up finds management chain, traversing down finds all subordinates.
The timeline pattern orders events with NEXT relationships. Event nodes chain together chronologically. Queries navigate forward or backward through time.
Query optimization techniques improve traversal performance. Starting traversals from specific nodes rather than scanning all nodes reduces search space. Using relationship types to filter edges during traversal avoids examining irrelevant connections.
Depth-limited traversals prevent runaway queries that explore the entire graph. Setting maximum hop counts bounds query execution time. Queries specify depth limits for operations like friend-of-friend lookups.
# Optimized query - start from specific node, limit depth
def recommendations(user_id, max_depth: 3)
graph.query(
"MATCH (u:User {id: $user_id})-[:PURCHASED*1..#{max_depth}]->(p:Product)",
user_id: user_id
)
end
Indexing strategies accelerate node lookups. The database creates indexes on frequently queried properties like user IDs or email addresses. Index-free adjacency handles relationship traversal, but finding starting nodes requires indexes.
Full-text indexes enable searching node properties for keywords. A product graph indexes product names and descriptions for search functionality. Spatial indexes support geographic queries on location nodes.
Batch operations handle bulk data loading. Initial graph population loads nodes first, then creates edges. This two-phase approach prevents dangling edge references. The database buffers writes and commits in batches for efficiency.
Batch updates query for matching nodes, modify properties, and create new relationships. Transactional boundaries ensure consistency. Large graph modifications may require splitting into smaller transactions to avoid lock contention.
Ruby Implementation
Ruby applications interact with graph databases through driver gems that provide query execution and result mapping. The implementation pattern establishes connections, constructs queries, and processes returned graphs.
The neo4j-ruby-driver gem provides official Neo4j database connectivity. It implements the Bolt protocol for efficient binary communication. The driver supports connection pooling, transaction management, and query parameterization.
require 'neo4j-ruby-driver'
driver = Neo4j::Driver::GraphDatabase.driver(
'bolt://localhost:7687',
Neo4j::Driver::AuthTokens.basic('neo4j', 'password')
)
session = driver.session
result = session.run(
'MATCH (u:User {name: $name})-[:FRIEND]->(f) RETURN f.name',
name: 'Alice'
)
result.each do |record|
puts record['f.name']
end
session.close
driver.close
Connection management handles database connectivity. The driver creates a connection pool managing multiple concurrent sessions. Applications acquire sessions for query execution and release them when complete. Proper session handling prevents connection leaks.
Transaction management wraps queries in ACID boundaries. Explicit transactions group multiple operations for atomic execution. Read transactions enable concurrent execution. Write transactions serialize for consistency.
session.write_transaction do |tx|
tx.run(
'CREATE (u:User {name: $name, email: $email})',
name: 'Charlie',
email: 'charlie@example.com'
)
tx.run(
'MATCH (u1:User {name: $from}), (u2:User {name: $to}) ' \
'CREATE (u1)-[:FRIEND {since: $since}]->(u2)',
from: 'Alice',
to: 'Charlie',
since: Date.today.to_s
)
end
ActiveGraph (formerly neo4j-core) provides ActiveRecord-style abstractions over Neo4j. It defines models as Ruby classes with property declarations and relationship definitions. The library generates Cypher queries from Ruby method calls.
require 'active_graph'
ActiveGraph::Base.driver = Neo4j::Driver::GraphDatabase.driver(
'bolt://localhost:7687',
Neo4j::Driver::AuthTokens.basic('neo4j', 'password')
)
class User
include ActiveGraph::Node
property :name, type: String
property :email, type: String
property :created_at, type: DateTime
has_many :out, :friends, type: :FRIEND, model_class: :User
has_many :in, :followers, type: :FRIEND, model_class: :User
end
class Friendship
include ActiveGraph::Relationship
from_class :User
to_class :User
type :FRIEND
property :since, type: Date
property :strength, type: Float
end
# Create nodes and relationships
alice = User.create(name: 'Alice', email: 'alice@example.com')
bob = User.create(name: 'Bob', email: 'bob@example.com')
alice.friends << bob
# Query relationships
alice.friends.each do |friend|
puts friend.name
end
# Complex traversals
alice.friends.friends.where_not(uuid: alice.uuid)
Query construction builds Cypher statements programmatically. The driver accepts parameterized queries preventing injection attacks. Parameters replace values without string interpolation.
def find_mutual_friends(user1_name, user2_name)
query = <<~CYPHER
MATCH (u1:User {name: $user1})-[:FRIEND]-(mutual)-[:FRIEND]-(u2:User {name: $user2})
RETURN mutual.name AS name, mutual.email AS email
CYPHER
session.run(query, user1: user1_name, user2: user2_name)
end
Result processing extracts data from query responses. Results contain records with named fields. Each record provides hash-like access to returned values. Values require type conversion for Ruby objects.
result = session.run(
'MATCH (u:User) RETURN u.name AS name, u.created_at AS created'
)
users = result.map do |record|
{
name: record['name'],
created_at: Time.parse(record['created'])
}
end
Relationship modeling defines graph structure in Ruby classes. Models declare outgoing and incoming relationships with direction specification. Relationship classes define edge properties and methods.
class Product
include ActiveGraph::Node
property :name, type: String
property :price, type: Float
has_many :in, :purchasers, type: :PURCHASED, model_class: :User
has_many :both, :similar_products, type: :SIMILAR, model_class: :Product
end
class Purchase
include ActiveGraph::Relationship
from_class :User
to_class :Product
type :PURCHASED
property :quantity, type: Integer
property :purchased_at, type: DateTime
property :price, type: Float
end
# Create purchase relationship
user = User.find_by(name: 'Alice')
product = Product.find_by(name: 'Widget')
Purchase.create(from_node: user, to_node: product, quantity: 2, price: 29.99)
Pattern matching queries express complex traversals. Ruby methods construct Cypher patterns for recommendation queries, path finding, and subgraph matching.
def product_recommendations(user_id, limit: 10)
query = <<~CYPHER
MATCH (u:User {id: $user_id})-[:PURCHASED]->(p1:Product)
MATCH (p1)<-[:PURCHASED]-(other:User)-[:PURCHASED]->(p2:Product)
WHERE NOT (u)-[:PURCHASED]->(p2)
RETURN p2.name, COUNT(DISTINCT other) AS score
ORDER BY score DESC
LIMIT $limit
CYPHER
session.run(query, user_id: user_id, limit: limit)
end
Error handling manages connection failures and query errors. Network interruptions raise connection exceptions. Invalid Cypher syntax produces query errors. Transaction failures require retry logic.
def safe_query(cypher, params = {})
retries = 3
begin
session.run(cypher, params)
rescue Neo4j::Driver::Exceptions::ServiceUnavailableException => e
retries -= 1
retry if retries > 0
raise
rescue Neo4j::Driver::Exceptions::ClientException => e
Rails.logger.error("Invalid query: #{e.message}")
raise
end
end
Tools & Ecosystem
Graph database implementations span native graph databases, multi-model databases with graph capabilities, and managed cloud services. Selection depends on deployment requirements, feature needs, and operational preferences.
Neo4j leads as a native graph database with ACID transactions, clustering, and rich query capabilities. The community edition provides core functionality for development and small deployments. Enterprise edition adds multi-datacenter replication, role-based access control, and advanced monitoring.
Neo4j uses Cypher query language for pattern matching. The database stores graphs with native adjacency structures for efficient traversal. Built-in algorithms compute centrality, community detection, and pathfinding. Browser-based interfaces visualize graph structure and query results.
Amazon Neptune offers managed graph database service with support for property graphs and RDF triples. The service handles infrastructure provisioning, backup, and replication. Neptune supports both Gremlin and SPARQL query languages.
Integration with AWS services provides monitoring through CloudWatch, security through IAM, and backup through automated snapshots. The service scales read replicas across availability zones. Neptune optimized instances provide high-throughput graph queries.
ArangoDB implements multi-model database supporting graphs, documents, and key-value pairs in a unified system. AQL query language handles graph traversals alongside document queries. The database stores graphs as edge collections referencing document collections.
This approach enables applications to combine graph relationships with document properties in single queries. Sharding distributes graph data across cluster nodes. The database provides JavaScript-based stored procedures and transaction support.
JanusGraph provides distributed graph database built on storage backends like Apache Cassandra or HBase. The architecture separates graph logic from storage implementation. Gremlin queries express traversals and pattern matching.
Elasticsearch integration enables full-text search on graph properties. The system scales horizontally by partitioning graphs across storage nodes. Graph algorithms execute through Spark integration for large-scale analytics.
Ruby gems provide database connectivity and abstraction layers:
The neo4j-ruby-driver implements Neo4j's Bolt protocol for direct database communication. It handles connection pooling, query execution, and result streaming. The driver supports transaction management and query parameterization.
ActiveGraph adds ActiveRecord-style models and relationships over Neo4j. Classes define nodes and edges with property declarations. The library generates Cypher from Ruby method chains. Validation and callback support integrate with Rails conventions.
gremlin_client connects to Gremlin-compatible databases including JanusGraph and Amazon Neptune. The gem constructs Gremlin queries and processes graph results. Traversal methods chain operations for complex graph navigation.
Query language options differ across databases:
Cypher uses ASCII art syntax for pattern matching. Parentheses represent nodes, arrows indicate relationships, brackets contain properties. The declarative style focuses on what patterns to match rather than how to traverse.
MATCH (u:User)-[:FRIEND*2]-(fof:User)
WHERE u.name = 'Alice' AND fof.age > 25
RETURN DISTINCT fof.name
Gremlin provides imperative traversal language with method chaining. Traversal steps navigate from vertices to edges to properties. The style resembles functional programming with map, filter, and reduce operations.
g.V().has('name', 'Alice')
.out('friend').out('friend')
.has('age', gt(25))
.values('name')
.dedup()
Monitoring and observability track database performance and health. Neo4j provides metrics for query execution time, cache hit rates, and transaction throughput. The database exposes JMX endpoints for monitoring tools.
Query logging captures slow queries and execution plans. Profiling explains query performance identifying bottlenecks. The database tracks lock contention and transaction conflicts.
Backup and recovery strategies protect graph data. Full backups capture complete database state. Incremental backups record changes since last full backup. Point-in-time recovery restores to specific timestamps.
Neo4j enterprise supports online backup without downtime. Cloud services provide automated backup scheduling and retention management. Export tools dump graphs to GraphML or JSON formats for archival.
Migration tools handle schema evolution and data transformation. Neo4j migration framework versions graph schemas similar to Rails migrations. Scripts create constraints, indexes, and relationship types.
# Neo4j migration example
class AddEmailConstraint
def up
'CREATE CONSTRAINT ON (u:User) ASSERT u.email IS UNIQUE'
end
def down
'DROP CONSTRAINT ON (u:User) ASSERT u.email IS UNIQUE'
end
end
Development tools assist with graph modeling and testing. Graph visualization tools display database structure and query results. The Neo4j browser provides interactive query execution and result exploration.
Testing libraries create temporary graph fixtures for unit tests. Graph generators produce synthetic data for performance testing. Schema validators ensure consistent node and relationship structures.
Practical Examples
Real-world applications demonstrate graph database capabilities across domains. These examples show complete implementations from data modeling through query execution.
Social network friend recommendations identify potential connections based on mutual friends and shared interests. The system analyzes the friend graph to find users with many mutual connections who aren't currently friends.
class RecommendationEngine
def initialize(driver)
@driver = driver
end
def friend_suggestions(user_id, limit: 10)
session = @driver.session
query = <<~CYPHER
MATCH (u:User {id: $user_id})-[:FRIEND]->(friend)-[:FRIEND]->(suggestion)
WHERE NOT (u)-[:FRIEND]->(suggestion)
AND u <> suggestion
WITH suggestion, COUNT(DISTINCT friend) AS mutual_friends
MATCH (suggestion)-[:INTERESTED_IN]->(interest)<-[:INTERESTED_IN]-(u:User {id: $user_id})
WITH suggestion, mutual_friends, COUNT(DISTINCT interest) AS shared_interests
RETURN suggestion.id, suggestion.name, mutual_friends, shared_interests
ORDER BY mutual_friends DESC, shared_interests DESC
LIMIT $limit
CYPHER
result = session.run(query, user_id: user_id, limit: limit)
suggestions = result.map do |record|
{
id: record['suggestion.id'],
name: record['suggestion.name'],
mutual_friends: record['mutual_friends'],
shared_interests: record['shared_interests']
}
end
session.close
suggestions
end
end
The query traverses two hops from the starting user to find friends-of-friends. Filtering excludes existing friends and the user themselves. Aggregation counts mutual connections. A secondary pattern matches shared interests between users. Results rank by connection strength.
Product recommendation system analyzes purchase patterns to suggest relevant products. The engine identifies products frequently purchased together and ranks by purchase frequency.
class ProductRecommender
def collaborative_filtering(user_id, limit: 5)
session = @driver.session
query = <<~CYPHER
MATCH (u:User {id: $user_id})-[:PURCHASED]->(p1:Product)
MATCH (p1)<-[:PURCHASED]-(other:User)-[r:PURCHASED]->(p2:Product)
WHERE NOT (u)-[:PURCHASED]->(p2)
WITH p2, COUNT(DISTINCT other) AS popularity, AVG(r.rating) AS avg_rating
MATCH (p2)-[:IN_CATEGORY]->(cat:Category)
RETURN p2.id, p2.name, p2.price, cat.name AS category,
popularity, avg_rating
ORDER BY popularity DESC, avg_rating DESC
LIMIT $limit
CYPHER
result = session.run(query, user_id: user_id, limit: limit)
recommendations = result.map do |record|
{
id: record['p2.id'],
name: record['p2.name'],
price: record['p2.price'],
category: record['category'],
popularity_score: record['popularity'],
avg_rating: record['avg_rating']
}
end
session.close
recommendations
end
end
This collaborative filtering approach finds products purchased by users with similar purchase history. The query excludes already purchased items. Aggregations compute popularity and average rating. Category information enriches recommendations.
Knowledge graph query system represents entities and semantic relationships for information retrieval. The graph stores concepts, definitions, and connections for question answering.
class KnowledgeGraph
def find_related_concepts(concept_name, relationship_types: nil, max_depth: 2)
session = @driver.session
rel_filter = if relationship_types
":" + relationship_types.join("|:")
else
""
end
query = <<~CYPHER
MATCH path = (start:Concept {name: $concept})-[#{rel_filter}*1..#{max_depth}]-(related:Concept)
WITH related, path, relationships(path) AS rels
RETURN DISTINCT related.name AS name,
related.definition AS definition,
[r IN rels | type(r)] AS relationship_path,
length(path) AS distance
ORDER BY distance, related.name
CYPHER
result = session.run(query, concept: concept_name)
concepts = result.map do |record|
{
name: record['name'],
definition: record['definition'],
relationships: record['relationship_path'],
distance: record['distance']
}
end
session.close
concepts
end
def explain_connection(concept1, concept2)
session = @driver.session
query = <<~CYPHER
MATCH path = shortestPath(
(c1:Concept {name: $concept1})-[*]-(c2:Concept {name: $concept2})
)
WITH path, nodes(path) AS concepts, relationships(path) AS rels
RETURN [c IN concepts | c.name] AS concept_path,
[r IN rels | type(r)] AS relationship_types,
length(path) AS path_length
CYPHER
result = session.run(query, concept1: concept1, concept2: concept2).first
if result
{
path: result['concept_path'],
relationships: result['relationship_types'],
length: result['path_length']
}
else
nil
end
session.close
end
end
The system finds concepts related through multiple relationship types and depths. Variable-length path matching discovers connections. Shortest path algorithms identify direct conceptual links. Relationship type filtering constrains traversal to specific semantic connections.
Access control and permissions model organizational hierarchies and role-based access. The graph represents users, roles, resources, and permissions with inheritance.
class PermissionChecker
def can_access?(user_id, resource_id)
session = @driver.session
query = <<~CYPHER
MATCH (u:User {id: $user_id})-[:HAS_ROLE]->(role:Role)
MATCH (role)-[:CAN_ACCESS*]->(permission:Permission)
MATCH (permission)-[:GRANTS_ACCESS_TO]->(resource:Resource {id: $resource_id})
RETURN COUNT(*) > 0 AS has_access
CYPHER
result = session.run(query, user_id: user_id, resource_id: resource_id).first
session.close
result['has_access']
end
def user_permissions(user_id)
session = @driver.session
query = <<~CYPHER
MATCH (u:User {id: $user_id})-[:HAS_ROLE]->(role:Role)
OPTIONAL MATCH (role)-[:CAN_ACCESS*]->(perm:Permission)
OPTIONAL MATCH (perm)-[:GRANTS_ACCESS_TO]->(resource:Resource)
RETURN role.name AS role,
COLLECT(DISTINCT perm.action) AS permissions,
COLLECT(DISTINCT resource.name) AS resources
CYPHER
result = session.run(query, user_id: user_id)
roles = result.map do |record|
{
role: record['role'],
permissions: record['permissions'],
resources: record['resources']
}
end
session.close
roles
end
end
Role-based access control queries traverse from users through roles to permissions. Variable-length paths handle role inheritance where roles grant other roles. Permission checking becomes a graph reachability query.
Reference
Graph Database Components
| Component | Description | Properties |
|---|---|---|
| Node | Entity in the graph | ID, labels, key-value properties |
| Edge | Directed relationship between nodes | Type, start node, end node, properties |
| Property | Key-value attribute on node or edge | Key name, value, data type |
| Label | Type classification for nodes | Name string, applied to nodes |
| Path | Sequence of connected nodes and edges | Length, nodes, relationships |
| Index | Lookup structure for properties | Property keys, node labels |
Common Query Patterns
| Pattern | Cypher Example | Use Case |
|---|---|---|
| Node lookup | MATCH (n:Label {property: value}) RETURN n | Find specific entities |
| Relationship traversal | MATCH (a)-[:TYPE]->(b) RETURN b | Navigate connections |
| Variable-length path | MATCH (a)-[:TYPE*1..3]->(b) RETURN b | Multi-hop traversal |
| Shortest path | MATCH p=shortestPath((a)-[*]-(b)) RETURN p | Find minimal connection |
| Pattern matching | MATCH (a)-[:TYPE1]->(b)-[:TYPE2]->(c) | Identify graph shapes |
| Aggregation | MATCH (a)-[:TYPE]->(b) RETURN COUNT(b) | Summarize relationships |
| Filtering | WHERE n.property > value | Constrain results |
| Optional matching | OPTIONAL MATCH (a)-[:TYPE]->(b) | Handle missing relationships |
Neo4j Ruby Driver Methods
| Method | Description | Example |
|---|---|---|
| GraphDatabase.driver | Create driver instance | driver = GraphDatabase.driver(uri, auth) |
| driver.session | Open database session | session = driver.session |
| session.run | Execute query | result = session.run(query, params) |
| session.write_transaction | Execute write transaction | session.write_transaction { tx.run(query) } |
| session.read_transaction | Execute read transaction | session.read_transaction { tx.run(query) } |
| result.each | Iterate result records | result.each { record } |
| record[] | Access field by name | value = record['field_name'] |
| session.close | Close session | session.close |
| driver.close | Close driver connection | driver.close |
ActiveGraph Model Methods
| Method | Description | Example |
|---|---|---|
| property | Define node property | property :name, type: String |
| has_many :out | Define outgoing relationship | has_many :out, :friends, type: :FRIEND |
| has_many :in | Define incoming relationship | has_many :in, :followers, type: :FOLLOWS |
| has_many :both | Define bidirectional relationship | has_many :both, :connections |
| create | Create node instance | User.create(name: 'Alice') |
| find_by | Query nodes by property | User.find_by(email: 'alice@example.com') |
| where | Filter nodes | User.where(active: true) |
| all | Retrieve all nodes | User.all |
Relationship Types by Domain
| Domain | Relationship Type | Direction | Properties |
|---|---|---|---|
| Social Network | FOLLOWS | unidirectional | since, strength |
| Social Network | FRIEND | bidirectional | since, status |
| Social Network | LIKES | unidirectional | timestamp |
| E-commerce | PURCHASED | unidirectional | quantity, price, date |
| E-commerce | VIEWED | unidirectional | timestamp, duration |
| E-commerce | IN_CATEGORY | unidirectional | none |
| Knowledge | IS_A | unidirectional | none |
| Knowledge | RELATED_TO | bidirectional | relationship_type, strength |
| Organization | WORKS_IN | unidirectional | role, start_date |
| Organization | REPORTS_TO | unidirectional | none |
| Organization | MANAGES | unidirectional | since |
Graph Algorithms
| Algorithm | Purpose | Time Complexity | Use Case |
|---|---|---|---|
| Shortest Path | Find minimal path between nodes | O(E + V log V) | Navigation, routing |
| PageRank | Compute node importance | O(iterations * E) | Influence scoring |
| Community Detection | Identify clusters | O(E log V) | Group analysis |
| Degree Centrality | Measure direct connections | O(V) | Hub identification |
| Betweenness Centrality | Find bridge nodes | O(V * E) | Critical point detection |
| Triangle Count | Detect triangular patterns | O(E^1.5) | Clustering coefficient |
| Connected Components | Find isolated subgraphs | O(V + E) | Network segmentation |
Query Optimization Techniques
| Technique | Description | Impact |
|---|---|---|
| Start from indexed nodes | Begin traversal at specific nodes | Reduces search space |
| Limit path depth | Set maximum traversal hops | Prevents exponential expansion |
| Use relationship type filters | Specify edge types in patterns | Eliminates irrelevant paths |
| Add property constraints early | Filter on properties before traversal | Reduces intermediate results |
| Create appropriate indexes | Index frequently queried properties | Speeds node lookup |
| Use LIMIT clause | Restrict result count | Reduces processing and transfer |
| Profile queries | Analyze execution plans | Identifies bottlenecks |
| Avoid Cartesian products | Match patterns carefully | Prevents result explosion |
Transaction Isolation Levels
| Level | Behavior | Neo4j Support |
|---|---|---|
| Read Uncommitted | See uncommitted changes | No |
| Read Committed | See committed changes only | Yes (default) |
| Repeatable Read | Consistent reads within transaction | Yes |
| Serializable | Full isolation | Yes |
Connection Configuration Options
| Option | Description | Default |
|---|---|---|
| uri | Database connection string | bolt://localhost:7687 |
| auth | Authentication credentials | none |
| max_connection_lifetime | Maximum connection age | 1 hour |
| max_connection_pool_size | Maximum pooled connections | 100 |
| connection_acquisition_timeout | Wait time for connection | 60 seconds |
| encrypted | Enable TLS encryption | false |
| trust_strategy | Certificate verification | TRUST_ALL_CERTIFICATES |