CrackedRuby - Graph Databases

Overview

Graph databases store and query data based on relationships between entities. Unlike relational databases that use tables with foreign keys to represent connections, graph databases treat relationships as first-class citizens with direct pointers between nodes. This structure eliminates join operations and enables efficient traversal of connected data.

The graph data model consists of nodes (entities), edges (relationships), and properties (attributes). A social network represents users as nodes with "follows" edges connecting them. An e-commerce system models products, categories, and customers as nodes with "belongs_to", "purchased", and "recommends" relationships. The database stores these connections as direct references rather than requiring index lookups to reconstruct relationships.

Graph databases excel at queries that traverse multiple relationship levels. Finding friends-of-friends in a social network requires a simple two-hop traversal rather than multiple self-joins. Recommendation systems query "users who bought this also bought" patterns through direct relationship following. Fraud detection identifies suspicious transaction patterns by analyzing networks of accounts, transfers, and merchants.

The approach differs fundamentally from relational modeling. Relational databases normalize data into separate tables and reconstruct relationships through foreign key joins. Each join operation requires index scanning and matching operations that become expensive with multiple relationship levels. Graph databases navigate from one node to connected nodes through direct memory addresses or pointers, maintaining constant-time relationship traversal regardless of database size.

# Relational approach - multiple joins required
User.joins(:friendships)
    .joins("JOIN friendships f2 ON friendships.friend_id = f2.user_id")
    .joins("JOIN users u2 ON f2.friend_id = u2.id")
    .where(users: { id: current_user.id })
    
# Graph approach - direct traversal
user.friends.flat_map(&:friends)

Key Principles

Graph databases organize data around three fundamental components: nodes, edges, and properties. Nodes represent entities such as people, products, or locations. Edges encode relationships between nodes with directional connections. Properties attach attributes to both nodes and edges as key-value pairs.

Nodes function as vertices in the graph structure. Each node contains properties describing the entity and maintains references to connected edges. Node types or labels categorize entities into classes like User, Product, or Order. The database assigns each node a unique identifier for direct access and relationship targeting.

Edges establish directed connections between nodes. A "FOLLOWS" edge points from one user to another. A "PURCHASED" edge connects a customer to a product with a timestamp property. Edge directionality matters for queries - traversing incoming versus outgoing edges produces different result sets. Edges themselves carry properties storing relationship metadata such as weight, date, or status.

Properties store attributes as name-value pairs on nodes and edges. A User node contains properties for name, email, and registration_date. A PURCHASED edge includes properties for quantity, price, and order_date. The schema remains flexible - different nodes of the same type may have different property sets.

The property graph model combines these elements into a labeled, directed, attributed multigraph. Nodes and edges carry labels for type classification. Multiple edges can connect the same node pair with different relationship types. Properties provide rich metadata throughout the structure.

Index-free adjacency defines a core architectural principle. Each node maintains direct references to adjacent nodes and connecting edges. Traversing from one node to connected nodes requires no index lookup - the database follows pointers directly. This design enables constant-time relationship traversal. The time to find connected nodes remains identical whether the database contains thousands or billions of nodes.

Graph traversal algorithms navigate the structure through depth-first or breadth-first patterns. Queries specify starting nodes and traversal patterns to find paths, subgraphs, or aggregated results. Pattern matching identifies specific graph shapes such as triangles, chains, or hub-and-spoke configurations.

# Node with properties
user = {
  id: 'user:123',
  labels: ['User', 'Customer'],
  properties: {
    name: 'Alice',
    email: 'alice@example.com',
    created_at: Time.now
  }
}

# Edge with properties
friendship = {
  id: 'rel:456',
  type: 'FOLLOWS',
  from: 'user:123',
  to: 'user:789',
  properties: {
    since: Date.new(2020, 1, 15),
    strength: 0.85
  }
}

Cypher query language provides declarative pattern matching for graph queries. The syntax uses ASCII art to represent graph patterns. Parentheses denote nodes, arrows represent relationships, and brackets contain properties. A query matches patterns against the stored graph and returns matching subgraphs.

MATCH (user:User)-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof)
WHERE user.name = 'Alice'
RETURN fof.name

Graph databases support ACID transactions for consistency guarantees. Read and write operations execute within transaction boundaries. The database maintains referential integrity for relationships - deleting a node requires handling connected edges.

Design Considerations

Graph databases solve problems where relationships form the core query pattern. Choosing between graph and relational databases depends on data connectivity, query patterns, and schema flexibility requirements.

Relationship-heavy workloads benefit from graph structure. Social networks query friend connections, followers, and influence paths. Recommendation engines analyze purchase patterns and product similarities. Knowledge graphs link concepts through semantic relationships. These scenarios involve multi-hop traversals that become inefficient with relational joins.

Relational databases perform well for entity-centric queries. Retrieving user profiles by ID, filtering products by category, or aggregating sales by region execute efficiently with indexed lookups. When queries primarily access individual records or perform set-based filtering without relationship traversal, relational models remain optimal.

Query complexity indicates database selection. Graph databases excel at variable-length path queries. Finding all reachable nodes within N hops, shortest paths between entities, or circular relationship patterns require simple graph traversals. The equivalent relational queries involve recursive CTEs or multiple self-joins that degrade performance exponentially with path length.

Relational databases handle fixed-relationship queries efficiently. Joining three tables with known foreign keys performs well with proper indexing. When relationship depth remains constant and known in advance, relational optimization techniques apply effectively.

Schema evolution differs between approaches. Graph databases support flexible schemas where nodes of the same type carry different property sets. Adding new relationship types or node properties requires no schema migration. This flexibility accommodates evolving data models and semi-structured data.

Relational schemas enforce structure through table definitions and constraints. Schema changes require ALTER TABLE operations that lock tables during migration. Strict typing and normalization provide data consistency guarantees but reduce flexibility.

Write patterns influence database choice. Graph databases handle frequent relationship modifications efficiently. Adding followers, creating product associations, or linking related content requires simple edge creation. The database updates adjacency lists without reindexing large tables.

Relational databases optimize for bulk inserts and transactional consistency. Loading batches of records with foreign keys performs well. Frequent relationship changes require updating join tables and maintaining referential integrity across tables.

Data volume characteristics affect performance differently. Graph databases maintain constant-time traversal regardless of database size due to index-free adjacency. A query traversing three relationship levels takes the same time whether the database contains millions or billions of nodes.

Relational database join performance degrades with table size. Even with indexes, joining large tables requires scanning more entries. Deep relationship queries compound this effect across multiple join operations.

Analytical queries distinguish database types. Relational databases excel at aggregations, grouping, and set operations over large datasets. SQL engines optimize for full table scans, parallel processing, and complex aggregate functions.

Graph databases optimize for traversal-based analytics. Computing centrality measures, finding communities, or analyzing network properties leverage graph algorithms. However, calculating global aggregates or performing fact table roll-ups remains less efficient than specialized analytical databases.

# Graph traversal - constant time per hop
def mutual_friends(user_id, friend_id)
  graph.query(
    "MATCH (u:User {id: $user_id})-[:FRIEND]-(mutual)-[:FRIEND]-(f:User {id: $friend_id})",
    user_id: user_id, friend_id: friend_id
  )
end

# Relational - joins grow with data size
def mutual_friends(user_id, friend_id)
  Friendship.where(user_id: user_id)
            .joins("INNER JOIN friendships f2 ON friendships.friend_id = f2.user_id")
            .where("f2.friend_id = ?", friend_id)
end

Hybrid architectures combine database types. Store core entity data in relational tables with proper normalization. Replicate relationship structures to a graph database for traversal queries. This pattern maintains transactional consistency for business data while enabling efficient graph analytics.

Polyglot persistence acknowledges that different data models serve different query patterns. User profiles and transactions reside in relational storage. Social connections and recommendations use graph structure. Product catalogs leverage document stores. Applications query the appropriate database for each use case.

Implementation Approaches

Implementing graph databases requires decisions about data modeling, query optimization, and integration patterns. The approach depends on whether using a dedicated graph database or adding graph capabilities to existing infrastructure.

Native graph storage uses specialized data structures optimized for graph operations. The database stores nodes and edges with direct adjacency pointers. Each node contains a list of incoming and outgoing edge identifiers. Following relationships requires dereferencing these pointers without index lookups.

This approach provides optimal traversal performance. The database allocates memory regions for node and edge storage with predictable layout. Accessing adjacent nodes reads from adjacent memory locations, improving cache utilization. Write operations append to adjacency lists without restructuring indexes.

Non-native graph storage implements graph abstractions over relational or key-value stores. A relational implementation creates tables for nodes and edges with foreign keys. A key-value approach stores adjacency lists as serialized structures. These implementations trade traversal performance for operational simplicity.

Non-native implementations integrate with existing database infrastructure. Organizations leverage familiar backup, replication, and monitoring tools. However, traversal operations require multiple database queries or index scans rather than direct pointer following.

Data modeling strategies transform domain concepts into graph structures. The process identifies entities that become nodes and relationships that become edges. Properties map to attributes on nodes and edges.

Modeling users and friendships creates User nodes with FRIEND edges. Product catalogs model products, categories, and tags as nodes with BELONGS_TO and TAGGED_WITH relationships. Organizational hierarchies represent employees and departments with WORKS_IN and MANAGES edges.

# Domain model
class User
  has_many :friendships
  has_many :friends, through: :friendships
end

# Graph model
{
  nodes: [
    {id: 'u1', label: 'User', properties: {name: 'Alice'}},
    {id: 'u2', label: 'User', properties: {name: 'Bob'}}
  ],
  edges: [
    {from: 'u1', to: 'u2', type: 'FRIEND', properties: {since: '2020-01-01'}}
  ]
}

Denormalization decisions balance query performance against storage overhead. Relational databases normalize data to eliminate redundancy. Graph databases often denormalize properties onto edges for query efficiency.

Storing user names on FRIEND edges avoids node lookups when displaying friend lists. Duplicating product prices on PURCHASED edges provides historical accuracy without joining to product nodes. This pattern increases storage but reduces query complexity.

Relationship directionality impacts query patterns. Bidirectional relationships require two edges - one in each direction. Queries traverse outgoing edges from the starting node. Creating reverse edges doubles storage but simplifies traversal logic.

Alternatively, store single directed edges and query in both directions. The database supports traversing edges against their direction. This approach reduces storage but requires the query engine to check both incoming and outgoing edges.

Schema design patterns organize graph structure. The star pattern creates a central hub node with spokes to related entities. A user node connects to posts, comments, and likes. Queries start at the hub and traverse outward.

The hierarchy pattern represents tree structures with parent-child relationships. Department nodes connect through REPORTS_TO edges. Traversing up finds management chain, traversing down finds all subordinates.

The timeline pattern orders events with NEXT relationships. Event nodes chain together chronologically. Queries navigate forward or backward through time.

Query optimization techniques improve traversal performance. Starting traversals from specific nodes rather than scanning all nodes reduces search space. Using relationship types to filter edges during traversal avoids examining irrelevant connections.

Depth-limited traversals prevent runaway queries that explore the entire graph. Setting maximum hop counts bounds query execution time. Queries specify depth limits for operations like friend-of-friend lookups.

# Optimized query - start from specific node, limit depth
def recommendations(user_id, max_depth: 3)
  graph.query(
    "MATCH (u:User {id: $user_id})-[:PURCHASED*1..#{max_depth}]->(p:Product)",
    user_id: user_id
  )
end

Indexing strategies accelerate node lookups. The database creates indexes on frequently queried properties like user IDs or email addresses. Index-free adjacency handles relationship traversal, but finding starting nodes requires indexes.

Full-text indexes enable searching node properties for keywords. A product graph indexes product names and descriptions for search functionality. Spatial indexes support geographic queries on location nodes.

Batch operations handle bulk data loading. Initial graph population loads nodes first, then creates edges. This two-phase approach prevents dangling edge references. The database buffers writes and commits in batches for efficiency.

Batch updates query for matching nodes, modify properties, and create new relationships. Transactional boundaries ensure consistency. Large graph modifications may require splitting into smaller transactions to avoid lock contention.

Ruby Implementation

Ruby applications interact with graph databases through driver gems that provide query execution and result mapping. The implementation pattern establishes connections, constructs queries, and processes returned graphs.

The neo4j-ruby-driver gem provides official Neo4j database connectivity. It implements the Bolt protocol for efficient binary communication. The driver supports connection pooling, transaction management, and query parameterization.

require 'neo4j-ruby-driver'

driver = Neo4j::Driver::GraphDatabase.driver(
  'bolt://localhost:7687',
  Neo4j::Driver::AuthTokens.basic('neo4j', 'password')
)

session = driver.session
result = session.run(
  'MATCH (u:User {name: $name})-[:FRIEND]->(f) RETURN f.name',
  name: 'Alice'
)

result.each do |record|
  puts record['f.name']
end

session.close
driver.close

Connection management handles database connectivity. The driver creates a connection pool managing multiple concurrent sessions. Applications acquire sessions for query execution and release them when complete. Proper session handling prevents connection leaks.

Transaction management wraps queries in ACID boundaries. Explicit transactions group multiple operations for atomic execution. Read transactions enable concurrent execution. Write transactions serialize for consistency.

session.write_transaction do |tx|
  tx.run(
    'CREATE (u:User {name: $name, email: $email})',
    name: 'Charlie',
    email: 'charlie@example.com'
  )
  
  tx.run(
    'MATCH (u1:User {name: $from}), (u2:User {name: $to}) ' \
    'CREATE (u1)-[:FRIEND {since: $since}]->(u2)',
    from: 'Alice',
    to: 'Charlie',
    since: Date.today.to_s
  )
end

ActiveGraph (formerly neo4j-core) provides ActiveRecord-style abstractions over Neo4j. It defines models as Ruby classes with property declarations and relationship definitions. The library generates Cypher queries from Ruby method calls.

require 'active_graph'

ActiveGraph::Base.driver = Neo4j::Driver::GraphDatabase.driver(
  'bolt://localhost:7687',
  Neo4j::Driver::AuthTokens.basic('neo4j', 'password')
)

class User
  include ActiveGraph::Node
  
  property :name, type: String
  property :email, type: String
  property :created_at, type: DateTime
  
  has_many :out, :friends, type: :FRIEND, model_class: :User
  has_many :in, :followers, type: :FRIEND, model_class: :User
end

class Friendship
  include ActiveGraph::Relationship
  
  from_class :User
  to_class :User
  type :FRIEND
  
  property :since, type: Date
  property :strength, type: Float
end

# Create nodes and relationships
alice = User.create(name: 'Alice', email: 'alice@example.com')
bob = User.create(name: 'Bob', email: 'bob@example.com')
alice.friends << bob

# Query relationships
alice.friends.each do |friend|
  puts friend.name
end

# Complex traversals
alice.friends.friends.where_not(uuid: alice.uuid)

Query construction builds Cypher statements programmatically. The driver accepts parameterized queries preventing injection attacks. Parameters replace values without string interpolation.

def find_mutual_friends(user1_name, user2_name)
  query = <<~CYPHER
    MATCH (u1:User {name: $user1})-[:FRIEND]-(mutual)-[:FRIEND]-(u2:User {name: $user2})
    RETURN mutual.name AS name, mutual.email AS email
  CYPHER
  
  session.run(query, user1: user1_name, user2: user2_name)
end

Result processing extracts data from query responses. Results contain records with named fields. Each record provides hash-like access to returned values. Values require type conversion for Ruby objects.

result = session.run(
  'MATCH (u:User) RETURN u.name AS name, u.created_at AS created'
)

users = result.map do |record|
  {
    name: record['name'],
    created_at: Time.parse(record['created'])
  }
end

Relationship modeling defines graph structure in Ruby classes. Models declare outgoing and incoming relationships with direction specification. Relationship classes define edge properties and methods.

class Product
  include ActiveGraph::Node
  
  property :name, type: String
  property :price, type: Float
  
  has_many :in, :purchasers, type: :PURCHASED, model_class: :User
  has_many :both, :similar_products, type: :SIMILAR, model_class: :Product
end

class Purchase
  include ActiveGraph::Relationship
  
  from_class :User
  to_class :Product
  type :PURCHASED
  
  property :quantity, type: Integer
  property :purchased_at, type: DateTime
  property :price, type: Float
end

# Create purchase relationship
user = User.find_by(name: 'Alice')
product = Product.find_by(name: 'Widget')
Purchase.create(from_node: user, to_node: product, quantity: 2, price: 29.99)

Pattern matching queries express complex traversals. Ruby methods construct Cypher patterns for recommendation queries, path finding, and subgraph matching.

def product_recommendations(user_id, limit: 10)
  query = <<~CYPHER
    MATCH (u:User {id: $user_id})-[:PURCHASED]->(p1:Product)
    MATCH (p1)<-[:PURCHASED]-(other:User)-[:PURCHASED]->(p2:Product)
    WHERE NOT (u)-[:PURCHASED]->(p2)
    RETURN p2.name, COUNT(DISTINCT other) AS score
    ORDER BY score DESC
    LIMIT $limit
  CYPHER
  
  session.run(query, user_id: user_id, limit: limit)
end

Error handling manages connection failures and query errors. Network interruptions raise connection exceptions. Invalid Cypher syntax produces query errors. Transaction failures require retry logic.

def safe_query(cypher, params = {})
  retries = 3
  begin
    session.run(cypher, params)
  rescue Neo4j::Driver::Exceptions::ServiceUnavailableException => e
    retries -= 1
    retry if retries > 0
    raise
  rescue Neo4j::Driver::Exceptions::ClientException => e
    Rails.logger.error("Invalid query: #{e.message}")
    raise
  end
end

Tools & Ecosystem

Graph database implementations span native graph databases, multi-model databases with graph capabilities, and managed cloud services. Selection depends on deployment requirements, feature needs, and operational preferences.

Neo4j leads as a native graph database with ACID transactions, clustering, and rich query capabilities. The community edition provides core functionality for development and small deployments. Enterprise edition adds multi-datacenter replication, role-based access control, and advanced monitoring.

Neo4j uses Cypher query language for pattern matching. The database stores graphs with native adjacency structures for efficient traversal. Built-in algorithms compute centrality, community detection, and pathfinding. Browser-based interfaces visualize graph structure and query results.

Amazon Neptune offers managed graph database service with support for property graphs and RDF triples. The service handles infrastructure provisioning, backup, and replication. Neptune supports both Gremlin and SPARQL query languages.

Integration with AWS services provides monitoring through CloudWatch, security through IAM, and backup through automated snapshots. The service scales read replicas across availability zones. Neptune optimized instances provide high-throughput graph queries.

ArangoDB implements multi-model database supporting graphs, documents, and key-value pairs in a unified system. AQL query language handles graph traversals alongside document queries. The database stores graphs as edge collections referencing document collections.

This approach enables applications to combine graph relationships with document properties in single queries. Sharding distributes graph data across cluster nodes. The database provides JavaScript-based stored procedures and transaction support.

JanusGraph provides distributed graph database built on storage backends like Apache Cassandra or HBase. The architecture separates graph logic from storage implementation. Gremlin queries express traversals and pattern matching.

Elasticsearch integration enables full-text search on graph properties. The system scales horizontally by partitioning graphs across storage nodes. Graph algorithms execute through Spark integration for large-scale analytics.

Ruby gems provide database connectivity and abstraction layers:

The neo4j-ruby-driver implements Neo4j's Bolt protocol for direct database communication. It handles connection pooling, query execution, and result streaming. The driver supports transaction management and query parameterization.

ActiveGraph adds ActiveRecord-style models and relationships over Neo4j. Classes define nodes and edges with property declarations. The library generates Cypher from Ruby method chains. Validation and callback support integrate with Rails conventions.

gremlin_client connects to Gremlin-compatible databases including JanusGraph and Amazon Neptune. The gem constructs Gremlin queries and processes graph results. Traversal methods chain operations for complex graph navigation.

Query language options differ across databases:

Cypher uses ASCII art syntax for pattern matching. Parentheses represent nodes, arrows indicate relationships, brackets contain properties. The declarative style focuses on what patterns to match rather than how to traverse.

MATCH (u:User)-[:FRIEND*2]-(fof:User)
WHERE u.name = 'Alice' AND fof.age > 25
RETURN DISTINCT fof.name

Gremlin provides imperative traversal language with method chaining. Traversal steps navigate from vertices to edges to properties. The style resembles functional programming with map, filter, and reduce operations.

g.V().has('name', 'Alice')
  .out('friend').out('friend')
  .has('age', gt(25))
  .values('name')
  .dedup()

Monitoring and observability track database performance and health. Neo4j provides metrics for query execution time, cache hit rates, and transaction throughput. The database exposes JMX endpoints for monitoring tools.

Query logging captures slow queries and execution plans. Profiling explains query performance identifying bottlenecks. The database tracks lock contention and transaction conflicts.

Backup and recovery strategies protect graph data. Full backups capture complete database state. Incremental backups record changes since last full backup. Point-in-time recovery restores to specific timestamps.

Neo4j enterprise supports online backup without downtime. Cloud services provide automated backup scheduling and retention management. Export tools dump graphs to GraphML or JSON formats for archival.

Migration tools handle schema evolution and data transformation. Neo4j migration framework versions graph schemas similar to Rails migrations. Scripts create constraints, indexes, and relationship types.

# Neo4j migration example
class AddEmailConstraint
  def up
    'CREATE CONSTRAINT ON (u:User) ASSERT u.email IS UNIQUE'
  end
  
  def down
    'DROP CONSTRAINT ON (u:User) ASSERT u.email IS UNIQUE'
  end
end

Development tools assist with graph modeling and testing. Graph visualization tools display database structure and query results. The Neo4j browser provides interactive query execution and result exploration.

Testing libraries create temporary graph fixtures for unit tests. Graph generators produce synthetic data for performance testing. Schema validators ensure consistent node and relationship structures.

Practical Examples

Real-world applications demonstrate graph database capabilities across domains. These examples show complete implementations from data modeling through query execution.

Social network friend recommendations identify potential connections based on mutual friends and shared interests. The system analyzes the friend graph to find users with many mutual connections who aren't currently friends.

class RecommendationEngine
  def initialize(driver)
    @driver = driver
  end
  
  def friend_suggestions(user_id, limit: 10)
    session = @driver.session
    
    query = <<~CYPHER
      MATCH (u:User {id: $user_id})-[:FRIEND]->(friend)-[:FRIEND]->(suggestion)
      WHERE NOT (u)-[:FRIEND]->(suggestion)
        AND u <> suggestion
      WITH suggestion, COUNT(DISTINCT friend) AS mutual_friends
      MATCH (suggestion)-[:INTERESTED_IN]->(interest)<-[:INTERESTED_IN]-(u:User {id: $user_id})
      WITH suggestion, mutual_friends, COUNT(DISTINCT interest) AS shared_interests
      RETURN suggestion.id, suggestion.name, mutual_friends, shared_interests
      ORDER BY mutual_friends DESC, shared_interests DESC
      LIMIT $limit
    CYPHER
    
    result = session.run(query, user_id: user_id, limit: limit)
    
    suggestions = result.map do |record|
      {
        id: record['suggestion.id'],
        name: record['suggestion.name'],
        mutual_friends: record['mutual_friends'],
        shared_interests: record['shared_interests']
      }
    end
    
    session.close
    suggestions
  end
end

The query traverses two hops from the starting user to find friends-of-friends. Filtering excludes existing friends and the user themselves. Aggregation counts mutual connections. A secondary pattern matches shared interests between users. Results rank by connection strength.

Product recommendation system analyzes purchase patterns to suggest relevant products. The engine identifies products frequently purchased together and ranks by purchase frequency.

class ProductRecommender
  def collaborative_filtering(user_id, limit: 5)
    session = @driver.session
    
    query = <<~CYPHER
      MATCH (u:User {id: $user_id})-[:PURCHASED]->(p1:Product)
      MATCH (p1)<-[:PURCHASED]-(other:User)-[r:PURCHASED]->(p2:Product)
      WHERE NOT (u)-[:PURCHASED]->(p2)
      WITH p2, COUNT(DISTINCT other) AS popularity, AVG(r.rating) AS avg_rating
      MATCH (p2)-[:IN_CATEGORY]->(cat:Category)
      RETURN p2.id, p2.name, p2.price, cat.name AS category,
             popularity, avg_rating
      ORDER BY popularity DESC, avg_rating DESC
      LIMIT $limit
    CYPHER
    
    result = session.run(query, user_id: user_id, limit: limit)
    
    recommendations = result.map do |record|
      {
        id: record['p2.id'],
        name: record['p2.name'],
        price: record['p2.price'],
        category: record['category'],
        popularity_score: record['popularity'],
        avg_rating: record['avg_rating']
      }
    end
    
    session.close
    recommendations
  end
end

This collaborative filtering approach finds products purchased by users with similar purchase history. The query excludes already purchased items. Aggregations compute popularity and average rating. Category information enriches recommendations.

Knowledge graph query system represents entities and semantic relationships for information retrieval. The graph stores concepts, definitions, and connections for question answering.

class KnowledgeGraph
  def find_related_concepts(concept_name, relationship_types: nil, max_depth: 2)
    session = @driver.session
    
    rel_filter = if relationship_types
      ":" + relationship_types.join("|:")
    else
      ""
    end
    
    query = <<~CYPHER
      MATCH path = (start:Concept {name: $concept})-[#{rel_filter}*1..#{max_depth}]-(related:Concept)
      WITH related, path, relationships(path) AS rels
      RETURN DISTINCT related.name AS name,
             related.definition AS definition,
             [r IN rels | type(r)] AS relationship_path,
             length(path) AS distance
      ORDER BY distance, related.name
    CYPHER
    
    result = session.run(query, concept: concept_name)
    
    concepts = result.map do |record|
      {
        name: record['name'],
        definition: record['definition'],
        relationships: record['relationship_path'],
        distance: record['distance']
      }
    end
    
    session.close
    concepts
  end
  
  def explain_connection(concept1, concept2)
    session = @driver.session
    
    query = <<~CYPHER
      MATCH path = shortestPath(
        (c1:Concept {name: $concept1})-[*]-(c2:Concept {name: $concept2})
      )
      WITH path, nodes(path) AS concepts, relationships(path) AS rels
      RETURN [c IN concepts | c.name] AS concept_path,
             [r IN rels | type(r)] AS relationship_types,
             length(path) AS path_length
    CYPHER
    
    result = session.run(query, concept1: concept1, concept2: concept2).first
    
    if result
      {
        path: result['concept_path'],
        relationships: result['relationship_types'],
        length: result['path_length']
      }
    else
      nil
    end
    
    session.close
  end
end

The system finds concepts related through multiple relationship types and depths. Variable-length path matching discovers connections. Shortest path algorithms identify direct conceptual links. Relationship type filtering constrains traversal to specific semantic connections.

Access control and permissions model organizational hierarchies and role-based access. The graph represents users, roles, resources, and permissions with inheritance.

class PermissionChecker
  def can_access?(user_id, resource_id)
    session = @driver.session
    
    query = <<~CYPHER
      MATCH (u:User {id: $user_id})-[:HAS_ROLE]->(role:Role)
      MATCH (role)-[:CAN_ACCESS*]->(permission:Permission)
      MATCH (permission)-[:GRANTS_ACCESS_TO]->(resource:Resource {id: $resource_id})
      RETURN COUNT(*) > 0 AS has_access
    CYPHER
    
    result = session.run(query, user_id: user_id, resource_id: resource_id).first
    session.close
    
    result['has_access']
  end
  
  def user_permissions(user_id)
    session = @driver.session
    
    query = <<~CYPHER
      MATCH (u:User {id: $user_id})-[:HAS_ROLE]->(role:Role)
      OPTIONAL MATCH (role)-[:CAN_ACCESS*]->(perm:Permission)
      OPTIONAL MATCH (perm)-[:GRANTS_ACCESS_TO]->(resource:Resource)
      RETURN role.name AS role,
             COLLECT(DISTINCT perm.action) AS permissions,
             COLLECT(DISTINCT resource.name) AS resources
    CYPHER
    
    result = session.run(query, user_id: user_id)
    
    roles = result.map do |record|
      {
        role: record['role'],
        permissions: record['permissions'],
        resources: record['resources']
      }
    end
    
    session.close
    roles
  end
end

Role-based access control queries traverse from users through roles to permissions. Variable-length paths handle role inheritance where roles grant other roles. Permission checking becomes a graph reachability query.

Reference

Graph Database Components

Component	Description	Properties
Node	Entity in the graph	ID, labels, key-value properties
Edge	Directed relationship between nodes	Type, start node, end node, properties
Property	Key-value attribute on node or edge	Key name, value, data type
Label	Type classification for nodes	Name string, applied to nodes
Path	Sequence of connected nodes and edges	Length, nodes, relationships
Index	Lookup structure for properties	Property keys, node labels

Common Query Patterns

Pattern	Cypher Example	Use Case
Node lookup	MATCH (n:Label {property: value}) RETURN n	Find specific entities
Relationship traversal	MATCH (a)-[:TYPE]->(b) RETURN b	Navigate connections
Variable-length path	MATCH (a)-[:TYPE*1..3]->(b) RETURN b	Multi-hop traversal
Shortest path	MATCH p=shortestPath((a)-[*]-(b)) RETURN p	Find minimal connection
Pattern matching	MATCH (a)-[:TYPE1]->(b)-[:TYPE2]->(c)	Identify graph shapes
Aggregation	MATCH (a)-[:TYPE]->(b) RETURN COUNT(b)	Summarize relationships
Filtering	WHERE n.property > value	Constrain results
Optional matching	OPTIONAL MATCH (a)-[:TYPE]->(b)	Handle missing relationships

Neo4j Ruby Driver Methods

Method	Description	Example
GraphDatabase.driver	Create driver instance	driver = GraphDatabase.driver(uri, auth)
driver.session	Open database session	session = driver.session
session.run	Execute query	result = session.run(query, params)
session.write_transaction	Execute write transaction	session.write_transaction { tx.run(query) }
session.read_transaction	Execute read transaction	session.read_transaction { tx.run(query) }
result.each	Iterate result records	result.each { record }
record[]	Access field by name	value = record['field_name']
session.close	Close session	session.close
driver.close	Close driver connection	driver.close

ActiveGraph Model Methods

Method	Description	Example
property	Define node property	property :name, type: String
has_many :out	Define outgoing relationship	has_many :out, :friends, type: :FRIEND
has_many :in	Define incoming relationship	has_many :in, :followers, type: :FOLLOWS
has_many :both	Define bidirectional relationship	has_many :both, :connections
create	Create node instance	User.create(name: 'Alice')
find_by	Query nodes by property	User.find_by(email: 'alice@example.com')
where	Filter nodes	User.where(active: true)
all	Retrieve all nodes	User.all

Relationship Types by Domain

Domain	Relationship Type	Direction	Properties
Social Network	FOLLOWS	unidirectional	since, strength
Social Network	FRIEND	bidirectional	since, status
Social Network	LIKES	unidirectional	timestamp
E-commerce	PURCHASED	unidirectional	quantity, price, date
E-commerce	VIEWED	unidirectional	timestamp, duration
E-commerce	IN_CATEGORY	unidirectional	none
Knowledge	IS_A	unidirectional	none
Knowledge	RELATED_TO	bidirectional	relationship_type, strength
Organization	WORKS_IN	unidirectional	role, start_date
Organization	REPORTS_TO	unidirectional	none
Organization	MANAGES	unidirectional	since

Graph Algorithms

Algorithm	Purpose	Time Complexity	Use Case
Shortest Path	Find minimal path between nodes	O(E + V log V)	Navigation, routing
PageRank	Compute node importance	O(iterations * E)	Influence scoring
Community Detection	Identify clusters	O(E log V)	Group analysis
Degree Centrality	Measure direct connections	O(V)	Hub identification
Betweenness Centrality	Find bridge nodes	O(V * E)	Critical point detection
Triangle Count	Detect triangular patterns	O(E^1.5)	Clustering coefficient
Connected Components	Find isolated subgraphs	O(V + E)	Network segmentation

Query Optimization Techniques

Technique	Description	Impact
Start from indexed nodes	Begin traversal at specific nodes	Reduces search space
Limit path depth	Set maximum traversal hops	Prevents exponential expansion
Use relationship type filters	Specify edge types in patterns	Eliminates irrelevant paths
Add property constraints early	Filter on properties before traversal	Reduces intermediate results
Create appropriate indexes	Index frequently queried properties	Speeds node lookup
Use LIMIT clause	Restrict result count	Reduces processing and transfer
Profile queries	Analyze execution plans	Identifies bottlenecks
Avoid Cartesian products	Match patterns carefully	Prevents result explosion

Transaction Isolation Levels

Level	Behavior	Neo4j Support
Read Uncommitted	See uncommitted changes	No
Read Committed	See committed changes only	Yes (default)
Repeatable Read	Consistent reads within transaction	Yes
Serializable	Full isolation	Yes

Connection Configuration Options

Option	Description	Default
uri	Database connection string	bolt://localhost:7687
auth	Authentication credentials	none
max_connection_lifetime	Maximum connection age	1 hour
max_connection_pool_size	Maximum pooled connections	100
connection_acquisition_timeout	Wait time for connection	60 seconds
encrypted	Enable TLS encryption	false
trust_strategy	Certificate verification	TRUST_ALL_CERTIFICATES

Graph Databases