Overview
Conceptual data models represent the high-level structure of data and relationships in a domain without concern for how the data will be stored or accessed. These models focus on business entities, their attributes, and relationships between entities, providing a shared understanding between stakeholders, developers, and database designers.
A conceptual data model serves as the foundation for logical and physical data models. While logical models introduce implementation concepts like normalization and physical models specify storage details, the conceptual model remains technology-agnostic and describes what data exists rather than how it exists.
The primary notation for conceptual data models is Entity-Relationship (ER) modeling, introduced by Peter Chen in 1976. ER models use entities (things of significance), attributes (properties of entities), and relationships (associations between entities) to describe data structure. Alternative approaches include UML class diagrams and object-role modeling, but ER diagrams remain the dominant standard.
[Customer] ----< places >---- [Order] ----< contains >---- [Product]
| | |
attributes: attributes: attributes:
- customer_id - order_id - product_id
- name - order_date - name
- email - total_amount - price
Conceptual models differ from implementation code in abstraction level. A conceptual model identifies that customers place orders, while implementation code specifies foreign keys, join tables, validation rules, and API endpoints. This separation allows business logic discussions without technical constraints.
Key Principles
Entity identification forms the core of conceptual modeling. An entity represents a distinct object or concept in the business domain that requires data storage. Entities must have independent existence meaning, clear boundaries, and multiple instances. A customer is an entity because multiple customers exist independently. A customer's address might be an attribute or a separate entity depending on whether addresses have independent significance.
Attribute definition specifies properties that describe entities. Attributes represent atomic facts about an entity that the system needs to track. Each attribute has a name, data type (at conceptual level: text, number, date, boolean), and optionality. The attribute customer_name describes a customer entity, while order_date describes an order. Attributes should not contain derivable information unless business rules require storing the derived value.
Relationship cardinality defines how many instances of one entity relate to instances of another entity. Cardinality types include:
- One-to-one: Each instance of Entity A relates to exactly one instance of Entity B (Employee to Desk)
- One-to-many: Each instance of Entity A relates to multiple instances of Entity B (Customer to Orders)
- Many-to-many: Multiple instances of Entity A relate to multiple instances of Entity B (Students to Courses)
Cardinality also includes participation constraints. Mandatory participation means every entity instance must participate in the relationship. Optional participation allows entity instances without the relationship.
Customer ||--o{ Order
(one customer, zero or more orders)
Order }o--|| Customer
(each order belongs to exactly one customer)
Student }o--o{ Course
(students enroll in courses, courses contain students)
Relationship attributes describe facts about relationships rather than entities. In a many-to-many relationship between Student and Course, enrollment_date and grade are relationship attributes because they describe the enrollment itself, not the student or course independently. Relationship attributes often indicate the need for an associative entity.
Associative entities resolve many-to-many relationships by creating an entity that represents the relationship itself. The Student-Course relationship becomes an Enrollment entity with its own attributes and relationships. This transformation clarifies the model and simplifies later implementation.
Identifier specification defines how to uniquely distinguish entity instances. Natural identifiers use business-meaningful attributes (social_security_number for Person). Surrogate identifiers introduce artificial keys (customer_id). Composite identifiers combine multiple attributes (country_code + phone_number). At the conceptual level, identifiers establish entity identity without implementation details like auto-increment or UUID generation.
Normalization principles apply at the conceptual level to reduce redundancy. While full normalization occurs in logical modeling, conceptual models benefit from basic principles:
- Group related attributes with the appropriate entity
- Avoid repeating groups of attributes
- Store facts once in the natural entity
- Represent multi-valued properties as relationships or separate entities
Abstraction and generalization organize entities hierarchically. Generalization combines similar entities into a supertype (Person generalizes Customer and Employee). Specialization divides a general entity into subtypes (Vehicle specializes into Car, Truck, Motorcycle). Subtype relationships inherit attributes and relationships from supertypes while adding specific properties.
Design Considerations
Scope boundaries determine what entities and relationships the model includes. Overly broad scope creates unwieldy models that mix concerns. Narrow scope misses critical relationships and forces awkward workarounds later. Define the business process or domain area first, then identify entities directly involved in that scope. Adjacent domains require interface definitions rather than full modeling.
For an e-commerce system, core scope includes customers, products, orders, and payments. Inventory management might be separate scope with defined touchpoints. Including warehouse operations, shipping logistics, and supplier management in the same conceptual model obscures the primary purchase flow.
Entity versus attribute decisions significantly impact model clarity. An address might be attributes on a Customer entity (street, city, postal_code) or a separate Address entity with a relationship to Customer. Choose entity representation when:
- The concept has multiple instances per parent entity (Customer has multiple addresses)
- The concept relates to multiple entity types (both Customer and Warehouse have addresses)
- The concept has its own complex attributes or relationships (Address relates to TaxJurisdiction)
- The concept requires independent management (address validation services)
Choose attribute representation for simple, single-valued properties tightly bound to their parent entity without independent relationships.
Relationship directionality affects model comprehension. While relationships are bidirectional at the conceptual level, naming and diagram placement communicate primary navigation. "Customer places Order" reads more naturally than "Order is placed by Customer," though both directions exist. Choose names and placement that match business terminology and primary access patterns.
Weak entity identification applies when entities depend on other entities for identity. An OrderLineItem cannot exist without its parent Order. The line item number identifies items within an order, but globally requires the combination of order_id and line_item_number. Weak entities simplify models by acknowledging dependency relationships but complicate implementation if the dependency chain grows deep.
Temporal modeling choices address how the model represents time. Current-state models store only present values, discarding history. Historical models retain past states, requiring effective dates or versioning. The choice depends on business requirements:
- Transaction history requires temporal modeling (order status changes over time)
- Reference data might not need history (current product price is sufficient)
- Regulatory compliance often mandates temporal tracking (audit trails)
- Analytical reporting benefits from historical snapshots (trending analysis)
Temporal modeling at the conceptual level identifies which entities need time-based tracking without specifying implementation strategies like slowly changing dimensions or event sourcing.
Denormalization trade-offs sometimes appear at the conceptual level. Strictly normalized models eliminate redundancy but create complex relationship chains. Strategic denormalization stores calculated or frequently accessed values directly. At the conceptual stage, note these decisions as design annotations rather than altering the fundamental model structure. Implementation phases apply denormalization based on performance requirements.
Inheritance versus composition organizes entity hierarchies. Inheritance (is-a relationships) creates subtype hierarchies where subtypes inherit supertype characteristics. Composition (has-a relationships) assembles complex entities from component entities.
A Vehicle supertype with Car and Truck subtypes uses inheritance. A Car entity with references to Engine, Transmission, and Body entities uses composition. Inheritance clarifies taxonomies but can create rigid hierarchies. Composition provides flexibility but increases relationship complexity. Business domain understanding guides this choice.
Implementation Approaches
Entity-Relationship modeling uses three primary notations: Chen notation, Crow's Foot notation, and IDEF1X notation. Chen notation emphasizes relationship diamonds and clear relationship naming. Crow's Foot notation provides visual cardinality indicators that resemble crow's feet at relationship ends. IDEF1X adds more rigorous syntax for key specification.
Chen notation example:
[Customer] ---(places)--< [Order]
Crow's Foot notation example:
Customer ||--o{ Order
Product }o--|| Category
Crow's Foot notation dominates current practice due to visual clarity and tool support. The notation positions cardinality markers at relationship ends: single lines indicate "one," crow's feet indicate "many," circles indicate "optional," and perpendicular lines indicate "mandatory."
Top-down modeling begins with high-level entities and progressively adds detail. Start by identifying major entities in the domain, then define relationships between them, then add attributes, then refine with subtypes and constraints. This approach works well when business requirements are clear and domain understanding is strong.
Process:
- List candidate entities from business process descriptions
- Eliminate duplicates and synonyms
- Define primary relationships between major entities
- Add attributes to entities
- Specify cardinality and participation constraints
- Identify subtypes and supertypes
- Validate with stakeholders
- Iterate to add detail
Bottom-up modeling starts with specific data elements and groups them into entities. Analyze existing forms, reports, and data sources to identify attributes, then cluster attributes that describe the same concept into entities. This approach fits situations with extensive existing data but unclear conceptual organization.
Process:
- Collect all data elements from sources
- Normalize attribute names and eliminate duplicates
- Group attributes by subject area
- Identify candidate entities from groups
- Define relationships based on data dependencies
- Validate entity cohesion
- Refine and abstract to appropriate level
Inside-out modeling identifies core entities first, then expands outward by exploring relationships. Begin with the most critical entity in the domain, define its attributes and immediate relationships, then follow relationship paths to discover connected entities. This approach focuses effort on the most important areas first.
View integration modeling develops separate models for different user groups or processes, then merges them into a comprehensive model. Each stakeholder group describes their view of the data, creating focused submodels. Integration resolves naming conflicts, combines overlapping entities, and ensures consistency.
Integration challenges include:
- Naming conflicts: different names for the same entity (Client vs Customer)
- Entity conflicts: different granularity or scope for similar entities
- Relationship conflicts: different cardinality or participation constraints
- Attribute conflicts: different attributes for similar entities
Resolution requires negotiation, standard naming conventions, and often compromise between ideal representations.
Iterative refinement cycles through modeling, validation, and revision. Initial models capture basic structure, validation sessions with stakeholders identify gaps and errors, and revisions improve accuracy. Multiple iterations converge on a stable model that satisfies requirements.
Validation techniques:
- Walkthrough scenarios: describe business processes using the model
- Data population: attempt to populate entities with sample instances
- Query testing: verify the model supports required information retrieval
- Exception handling: check how the model handles unusual cases
Pattern application reuses standard data modeling patterns for common situations. Party patterns model people and organizations. Product hierarchy patterns structure product catalogs. Transaction patterns capture business events. Applying proven patterns accelerates modeling and reduces errors, but requires adapting patterns to specific business contexts.
Practical Examples
E-commerce order management illustrates fundamental conceptual modeling. The domain includes customers who place orders for products. Orders contain multiple products with specified quantities. Products belong to categories and have pricing information.
Core entities:
- Customer: individuals who purchase products
- Order: records of customer purchases
- Product: items available for sale
- Category: product groupings
- OrderItem: line items within orders
Relationships:
- Customer places Order (one-to-many)
- Order contains OrderItem (one-to-many)
- OrderItem references Product (many-to-one)
- Product belongs to Category (many-to-one)
Conceptual model:
Customer
- customer_id (identifier)
- email
- name
- registration_date
Order
- order_id (identifier)
- customer_id (foreign concept)
- order_date
- status
- total_amount
OrderItem
- order_id (part of composite identifier)
- line_number (part of composite identifier)
- product_id (foreign concept)
- quantity
- unit_price
- line_total
Product
- product_id (identifier)
- category_id (foreign concept)
- name
- description
- base_price
- stock_quantity
Category
- category_id (identifier)
- name
- parent_category_id (for hierarchy)
The OrderItem entity as an associative entity resolves the many-to-many relationship between Order and Product. The line_number combined with order_id forms the identifier, showing weak entity dependency. The unit_price attribute on OrderItem rather than solely on Product captures historical pricing.
Healthcare patient records demonstrates complex relationships and temporal requirements. Patients receive treatments from providers, have diagnoses, and receive prescriptions. The model must track historical information for legal and medical continuity reasons.
Core entities:
- Patient: individuals receiving care
- Provider: healthcare professionals
- Appointment: scheduled interactions
- Diagnosis: identified conditions
- Prescription: medication orders
- Medication: drug inventory
Relationships:
- Patient schedules Appointment (one-to-many)
- Provider conducts Appointment (one-to-many)
- Appointment yields Diagnosis (one-to-many)
- Diagnosis leads to Prescription (one-to-many)
- Prescription specifies Medication (many-to-one)
Conceptual model with temporal attributes:
Patient
- patient_id (identifier)
- medical_record_number
- name
- date_of_birth
- blood_type
- allergies
Provider
- provider_id (identifier)
- license_number
- name
- specialty
- credential
Appointment
- appointment_id (identifier)
- patient_id
- provider_id
- scheduled_time
- actual_start_time
- actual_end_time
- appointment_type
- status
Diagnosis
- diagnosis_id (identifier)
- appointment_id
- diagnosis_code
- description
- diagnosed_date
- severity
Prescription
- prescription_id (identifier)
- diagnosis_id
- medication_id
- prescribed_date
- dosage
- frequency
- duration
- refills_allowed
- status
Medication
- medication_id (identifier)
- generic_name
- brand_names
- medication_class
- controlled_substance_schedule
The model separates scheduled_time from actual times in Appointment to handle real-world scheduling variations. Prescription references both Diagnosis (why prescribed) and Medication (what prescribed), creating a dependent relationship chain. Status attributes track temporal states without full history tables at the conceptual level.
Content management system shows inheritance hierarchies and polymorphic relationships. The system manages various content types (articles, videos, images) with common properties and type-specific attributes.
Core entities:
- User: content creators and consumers
- Content (supertype): base content properties
- Article (subtype): text-based content
- Video (subtype): video content
- Image (subtype): image content
- Tag: classification labels
- Comment: user feedback
Relationships:
- User creates Content (one-to-many)
- Content tagged with Tag (many-to-many)
- User writes Comment on Content (many-to-many associative)
- Content references related Content (many-to-many self-referential)
Conceptual model with inheritance:
User
- user_id (identifier)
- username
- email
- role
- registration_date
Content (supertype)
- content_id (identifier)
- creator_id
- title
- created_date
- modified_date
- published_date
- status
- view_count
Article (subtype of Content)
- body_text
- word_count
- reading_time
- language
Video (subtype of Content)
- video_file_reference
- duration
- resolution
- format
Image (subtype of Content)
- image_file_reference
- width
- height
- format
- alt_text
Tag
- tag_id (identifier)
- tag_name
- category
ContentTag (associative entity)
- content_id
- tag_id
- assigned_date
Comment
- comment_id (identifier)
- content_id
- user_id
- comment_text
- posted_date
- parent_comment_id (for threading)
The Content supertype captures common attributes inherited by all subtypes. Subtype-specific attributes remain with their respective entities. The self-referential relationship on Comment creates threaded discussions. The many-to-many relationship between Content and Tag uses an associative entity that could later capture additional relationship attributes like assigned_by or confidence_score.
Financial transaction ledger demonstrates temporal tracking and audit requirements. The system records financial transactions, account balances, and transaction history with immutability constraints.
Core entities:
- Account: financial accounts
- Transaction: recorded financial events
- TransactionLine: double-entry line items
- AccountBalance: point-in-time balance snapshots
- AuditLog: change tracking
Relationships:
- Account contains TransactionLine (one-to-many)
- Transaction composed of TransactionLine (one-to-many)
- Account has AccountBalance history (one-to-many)
Conceptual model:
Account
- account_id (identifier)
- account_number
- account_type
- account_name
- currency
- opened_date
- status
Transaction
- transaction_id (identifier)
- transaction_date
- transaction_time
- transaction_type
- description
- reference_number
- created_by
- created_timestamp
TransactionLine
- transaction_line_id (identifier)
- transaction_id
- account_id
- line_type (debit/credit)
- amount
- currency
- exchange_rate
AccountBalance
- balance_id (identifier)
- account_id
- balance_date
- opening_balance
- closing_balance
- calculated_timestamp
AuditLog
- audit_id (identifier)
- entity_type
- entity_id
- action_type
- old_values
- new_values
- changed_by
- changed_timestamp
TransactionLine implements double-entry bookkeeping where each Transaction has multiple lines that must balance. AccountBalance provides snapshot history without recalculating from all transactions. AuditLog tracks all changes for compliance and debugging. The model enforces temporal integrity through timestamp attributes and immutable transaction records.
Ruby Implementation
Ruby applications reflect conceptual data models through class definitions, associations, and business logic. While conceptual models remain implementation-agnostic, Ruby code embodies the conceptual structure using object-oriented principles.
ActiveRecord ORM mapping translates conceptual entities to Ruby classes. Each entity becomes a class inheriting from ActiveRecord::Base. Attributes map to database columns accessed as Ruby attributes. Relationships use association macros.
# Customer entity
class Customer < ApplicationRecord
has_many :orders
validates :email, presence: true, uniqueness: true
validates :name, presence: true
end
# Order entity
class Order < ApplicationRecord
belongs_to :customer
has_many :order_items
has_many :products, through: :order_items
validates :order_date, presence: true
validates :total_amount, numericality: { greater_than_or_equal_to: 0 }
end
# Product entity
class Product < ApplicationRecord
belongs_to :category
has_many :order_items
validates :name, presence: true
validates :base_price, numericality: { greater_than: 0 }
end
# OrderItem associative entity
class OrderItem < ApplicationRecord
belongs_to :order
belongs_to :product
validates :quantity, numericality: { greater_than: 0 }
validates :unit_price, numericality: { greater_than_or_equal_to: 0 }
before_save :calculate_line_total
private
def calculate_line_total
self.line_total = quantity * unit_price
end
end
The has_many and belongs_to macros declare relationships matching conceptual cardinality. The through option implements many-to-many relationships via associative entities. Validations enforce conceptual constraints like mandatory attributes and value ranges.
Plain Ruby object modeling represents conceptual models without database coupling. Each entity becomes a Ruby class with attributes and methods. Relationships use instance variables holding references to related objects.
class Customer
attr_accessor :customer_id, :email, :name, :registration_date
attr_reader :orders
def initialize(attributes = {})
@customer_id = attributes[:customer_id]
@email = attributes[:email]
@name = attributes[:name]
@registration_date = attributes[:registration_date] || Date.today
@orders = []
end
def place_order(order)
@orders << order
order.customer = self
end
def total_order_value
@orders.sum(&:total_amount)
end
end
class Order
attr_accessor :order_id, :customer, :order_date, :status, :total_amount
attr_reader :order_items
def initialize(attributes = {})
@order_id = attributes[:order_id]
@order_date = attributes[:order_date] || Date.today
@status = attributes[:status] || 'pending'
@total_amount = attributes[:total_amount] || 0
@order_items = []
end
def add_item(product, quantity, unit_price)
item = OrderItem.new(
order: self,
product: product,
quantity: quantity,
unit_price: unit_price
)
@order_items << item
calculate_total
item
end
private
def calculate_total
@total_amount = @order_items.sum(&:line_total)
end
end
class OrderItem
attr_accessor :order, :product, :quantity, :unit_price
def initialize(attributes = {})
@order = attributes[:order]
@product = attributes[:product]
@quantity = attributes[:quantity]
@unit_price = attributes[:unit_price]
end
def line_total
quantity * unit_price
end
end
This approach maintains conceptual relationships through object references without database dependencies. Methods implement business logic derived from the conceptual model. The place_order and add_item methods maintain bidirectional relationships.
Inheritance mapping implements conceptual subtypes using Ruby class inheritance. The supertype becomes a base class with shared attributes and behavior. Subtypes inherit from the supertype and add specific attributes.
# Content supertype
class Content
attr_accessor :content_id, :creator_id, :title, :created_date,
:modified_date, :published_date, :status, :view_count
def initialize(attributes = {})
@content_id = attributes[:content_id]
@creator_id = attributes[:creator_id]
@title = attributes[:title]
@created_date = attributes[:created_date] || Time.now
@modified_date = attributes[:modified_date]
@published_date = attributes[:published_date]
@status = attributes[:status] || 'draft'
@view_count = attributes[:view_count] || 0
end
def publish
@published_date = Time.now
@status = 'published'
end
def increment_views
@view_count += 1
end
end
# Article subtype
class Article < Content
attr_accessor :body_text, :word_count, :reading_time, :language
def initialize(attributes = {})
super
@body_text = attributes[:body_text]
@word_count = attributes[:word_count]
@reading_time = attributes[:reading_time]
@language = attributes[:language] || 'en'
end
def calculate_reading_time
@reading_time = (word_count / 200.0).ceil
end
end
# Video subtype
class Video < Content
attr_accessor :video_file_reference, :duration, :resolution, :format
def initialize(attributes = {})
super
@video_file_reference = attributes[:video_file_reference]
@duration = attributes[:duration]
@resolution = attributes[:resolution]
@format = attributes[:format]
end
def formatted_duration
minutes = duration / 60
seconds = duration % 60
"%d:%02d" % [minutes, seconds]
end
end
Ruby's single inheritance limits direct implementation of multiple inheritance from conceptual models. Modules provide mixins for shared behavior across unrelated classes:
module Taggable
def add_tag(tag)
@tags ||= []
@tags << tag unless @tags.include?(tag)
end
def remove_tag(tag)
@tags ||= []
@tags.delete(tag)
end
def tags
@tags ||= []
end
end
class Article < Content
include Taggable
end
class Video < Content
include Taggable
end
Temporal modeling in Ruby requires explicit attribute tracking. Historical records use timestamp attributes and immutability patterns.
class Account
attr_reader :account_id, :account_number, :balances
def initialize(attributes = {})
@account_id = attributes[:account_id]
@account_number = attributes[:account_number]
@balances = []
end
def current_balance
@balances.last&.closing_balance || 0
end
def balance_at(date)
@balances
.select { |b| b.balance_date <= date }
.max_by(&:balance_date)
&.closing_balance || 0
end
def record_balance(date, closing_balance)
balance = AccountBalance.new(
account: self,
balance_date: date,
closing_balance: closing_balance
)
@balances << balance
@balances.sort_by!(&:balance_date)
balance
end
end
class AccountBalance
attr_reader :account, :balance_date, :closing_balance, :calculated_timestamp
def initialize(attributes = {})
@account = attributes[:account]
@balance_date = attributes[:balance_date]
@closing_balance = attributes[:closing_balance]
@calculated_timestamp = Time.now
end
end
The immutable AccountBalance instances preserve historical states. The balance_at method queries history by date. This pattern translates conceptual temporal requirements into Ruby implementation.
Validation and constraints enforce conceptual rules through Ruby code. Validations check attribute values, relationship constraints ensure referential integrity, and business rules implement domain logic.
class Order
attr_accessor :order_id, :customer, :order_date, :status
attr_reader :order_items
def initialize(attributes = {})
@order_items = []
validate_and_assign(attributes)
end
def add_item(product, quantity, unit_price)
raise ArgumentError, "Product required" unless product
raise ArgumentError, "Quantity must be positive" unless quantity > 0
raise ArgumentError, "Unit price cannot be negative" unless unit_price >= 0
item = OrderItem.new(
order: self,
product: product,
quantity: quantity,
unit_price: unit_price
)
@order_items << item
item
end
def finalize
raise "Cannot finalize empty order" if @order_items.empty?
raise "Order already finalized" if @status == 'finalized'
@status = 'finalized'
@finalized_date = Time.now
end
private
def validate_and_assign(attributes)
@customer = attributes[:customer]
raise ArgumentError, "Customer required" unless @customer
@order_date = attributes[:order_date] || Date.today
@status = attributes[:status] || 'draft'
end
end
These validations translate conceptual participation constraints (mandatory customer) and business rules (positive quantities) into executable code.
Tools & Ecosystem
Diagramming tools create visual representations of conceptual data models. Lucidchart, Draw.io, and Microsoft Visio support ER diagram notation with drag-drop entity creation, relationship drawing, and cardinality specification. These tools export diagrams as images for documentation but lack code generation capabilities.
Specialized data modeling tools provide enhanced features. ERwin Data Modeler offers comprehensive modeling with forward and reverse engineering. Oracle SQL Developer Data Modeler integrates with Oracle databases. These tools distinguish between conceptual, logical, and physical models, managing transformations between levels.
Ruby gems for schema definition translate conceptual models into database schemas. ActiveRecord migrations define tables, columns, and constraints:
class CreateEcommerceSchema < ActiveRecord::Migration[7.0]
def change
create_table :customers do |t|
t.string :email, null: false
t.string :name, null: false
t.date :registration_date
t.timestamps
t.index :email, unique: true
end
create_table :orders do |t|
t.references :customer, null: false, foreign_key: true
t.date :order_date, null: false
t.string :status
t.decimal :total_amount, precision: 10, scale: 2
t.timestamps
end
create_table :products do |t|
t.references :category, foreign_key: true
t.string :name, null: false
t.text :description
t.decimal :base_price, precision: 10, scale: 2
t.integer :stock_quantity
t.timestamps
end
create_table :order_items do |t|
t.references :order, null: false, foreign_key: true
t.references :product, null: false, foreign_key: true
t.integer :line_number
t.integer :quantity, null: false
t.decimal :unit_price, precision: 10, scale: 2, null: false
t.decimal :line_total, precision: 10, scale: 2
t.timestamps
end
end
end
The migration syntax maps conceptual constructs to physical schema. The t.references method creates foreign keys for relationships. Constraints like null: false enforce mandatory attributes from the conceptual model.
Schema.rb analyzers reverse-engineer existing schemas into conceptual understanding. The rails-erd gem generates ER diagrams from ActiveRecord models:
# Gemfile
gem 'rails-erd', group: :development
# Generate diagram
bundle exec erd --attributes=foreign_keys,content
The generated diagram shows entities (models), attributes, and relationships inferred from ActiveRecord associations and foreign keys. This helps visualize implemented data structures and compare them against conceptual designs.
Documentation generators create reference documentation from conceptual models. Tools like SchemaSpy analyze database schemas and generate HTML documentation with entity descriptions, relationship diagrams, and constraint information. While these tools work with physical schemas, they reveal the underlying conceptual structure.
Model validation tools check conceptual model quality. Database normalization analyzers identify redundancy and anomalies. Relationship consistency checkers verify cardinality matches business rules. Naming convention validators ensure consistent entity and attribute naming.
Version control for models tracks conceptual model evolution. Storing ER diagrams as text-based formats (GraphML, PlantUML) enables diff comparisons. Git tracks changes to model files, providing history and collaboration features:
@startuml
entity Customer {
* customer_id : int
* email : string
* name : string
registration_date : date
}
entity Order {
* order_id : int
* customer_id : int <<FK>>
* order_date : date
status : string
total_amount : decimal
}
Customer ||--o{ Order
@enduml
Text-based formats integrate with development workflows, allowing model reviews in pull requests alongside code changes.
Database design patterns libraries provide reusable conceptual model fragments. Martin Fowler's "Analysis Patterns" and David Hay's "Data Model Patterns" catalog proven solutions for common modeling situations. These references accelerate modeling by providing starting points for party relationships, product hierarchies, transaction patterns, and temporal modeling.
Ruby-specific resources include ActiveRecord pattern documentation and Rails guide database sections. These resources show how conceptual patterns map to ActiveRecord implementations, bridging conceptual design and Ruby code.
Reference
Entity Relationship Notation
| Symbol | Meaning | Usage |
|---|---|---|
| Rectangle | Entity | Represents a thing or concept |
| Diamond | Relationship | Shows association between entities |
| Oval | Attribute | Property of entity or relationship |
| Underline | Identifier | Uniquely identifies instances |
| Double rectangle | Weak entity | Depends on parent entity for identity |
| Double diamond | Identifying relationship | Links weak entity to parent |
Cardinality Markers (Crow's Foot)
| Symbol | Meaning | Description |
|---|---|---|
| Single line | One | Exactly one instance |
| Crow's foot (three lines) | Many | Multiple instances |
| Circle | Optional | Zero or more |
| Perpendicular line | Mandatory | One or more |
| Double line | Identifying | Relationship part of identity |
Common Cardinality Patterns
| Pattern | Notation | Example |
|---|---|---|
| One-to-one mandatory | Entity1 mandatory line mandatory line Entity2 | Employee to Desk assignment |
| One-to-many optional | Entity1 single line crow's foot with circle Entity2 | Customer to Orders (customer may have no orders) |
| One-to-many mandatory | Entity1 single line crow's foot with line Entity2 | Order to OrderItems (order must have items) |
| Many-to-many | Entity1 crow's foot crow's foot Entity2 | Students to Courses |
Entity Types
| Type | Definition | Example |
|---|---|---|
| Strong entity | Exists independently | Customer, Product, Employee |
| Weak entity | Depends on parent entity | OrderLineItem, DependentFamilyMember |
| Associative entity | Represents many-to-many relationship | Enrollment (Student-Course), Assignment (Employee-Project) |
| Supertype | General entity with subtypes | Vehicle, Content, Person |
| Subtype | Specialized entity inheriting from supertype | Car (Vehicle), Article (Content), Employee (Person) |
Attribute Types
| Type | Definition | Example |
|---|---|---|
| Simple | Atomic value | age, price, email |
| Composite | Multiple components | address (street, city, postal_code) |
| Single-valued | One value per instance | date_of_birth, customer_id |
| Multi-valued | Multiple values per instance | phone_numbers, email_addresses |
| Derived | Calculated from other attributes | age (from date_of_birth), total_price |
| Optional | May be null | middle_name, fax_number |
| Mandatory | Must have value | customer_id, order_date |
Identifier Types
| Type | Description | Characteristics |
|---|---|---|
| Natural | Business-meaningful attribute | Social security number, email address, ISBN |
| Surrogate | Artificial key | Auto-increment ID, UUID, sequence number |
| Composite | Multiple attributes combined | country_code + phone_number, order_id + line_number |
| Candidate | Potential identifier | Multiple options exist, one selected as primary |
Relationship Types
| Type | Description | Implementation Note |
|---|---|---|
| Binary | Two entities | Most common relationship type |
| Ternary | Three entities | Often decomposed into binary relationships |
| Unary (recursive) | Entity to itself | Employee manages Employee, Category parent Category |
| Identifying | Determines weak entity identity | Order identifies OrderLineItem |
| Non-identifying | Does not affect identity | Customer places Order |
Modeling Decisions Checklist
| Decision | Considerations | Impact |
|---|---|---|
| Entity vs Attribute | Independent existence, multiple instances, relationships to other entities | Model complexity, query patterns |
| Natural vs Surrogate Key | Stability, meaningfulness, uniqueness guarantees | Data integrity, migration complexity |
| Subtype vs Separate Entities | Shared attributes, is-a relationship, behavior inheritance | Query complexity, schema flexibility |
| Normalized vs Denormalized | Redundancy, update anomalies, query performance | Data integrity, maintenance overhead |
| Current State vs Historical | Audit requirements, reporting needs, compliance | Storage requirements, query complexity |
| Weak vs Strong Entity | Identity dependency, existence dependency | Implementation complexity, cascade rules |
Common Modeling Patterns
| Pattern | Structure | Use Case |
|---|---|---|
| Party | Person and Organization as subtypes of Party | Customer relationship management |
| Product Hierarchy | Product with recursive category relationship | E-commerce catalogs |
| Transaction | Transaction with multiple line items | Financial systems, order processing |
| Temporal | Entity with effective date and end date | Historical tracking, versioning |
| Audit Trail | Change log with old and new values | Compliance, debugging |
| Type-Instance | Type entity defining instances | Product types, document templates |
| Role | Person in multiple roles | Employee who is also customer |
ActiveRecord Association Methods
| Method | Cardinality | Usage |
|---|---|---|
| belongs_to | Many-to-one | Order belongs_to Customer |
| has_one | One-to-one | User has_one Profile |
| has_many | One-to-many | Customer has_many Orders |
| has_many :through | Many-to-many | Student has_many Courses through Enrollments |
| has_and_belongs_to_many | Many-to-many | Student has_and_belongs_to_many Courses |
Normalization Levels (Conceptual Relevance)
| Level | Requirement | Conceptual Impact |
|---|---|---|
| 1NF | Atomic attributes, no repeating groups | Entity attribute structure |
| 2NF | No partial dependencies on composite keys | Weak entity identification |
| 3NF | No transitive dependencies | Entity responsibility boundaries |
| BCNF | Every determinant is candidate key | Relationship refinement |
Model Quality Metrics
| Metric | Good Practice | Warning Sign |
|---|---|---|
| Entities per model | 10-30 for focused domain | Over 50 indicates scope issues |
| Attributes per entity | 5-15 typical | Over 20 suggests decomposition needed |
| Relationships per entity | 3-7 average | Over 12 indicates central entity problems |
| Many-to-many relationships | Use associative entities | Direct many-to-many hides attributes |
| Weak entities | Minimal depth | Multi-level weakness complicates implementation |
| Subtype depth | 2-3 levels maximum | Deep hierarchies become rigid |