Overview
Static Site Generation (SSG) produces complete HTML pages during a build process rather than generating them dynamically for each request. The build system reads source content, applies templates, processes assets, and outputs a directory of static files ready for deployment to any web server or CDN.
SSG addresses the performance and security limitations of traditional dynamic websites. Dynamic sites execute server-side code for every request, querying databases and rendering templates repeatedly. Static sites eliminate this overhead by performing all processing once during the build phase. The resulting files contain no server-side code and require no database connections.
The approach originated in the early web when all sites were static HTML files created manually. As content management systems emerged, dynamic generation became dominant despite its computational costs. Modern SSG revived static generation by automating the build process while maintaining developer-friendly workflows with templates, data files, and content transformation.
# Basic static site generation concept
class StaticSiteGenerator
def initialize(content_dir, output_dir)
@content_dir = content_dir
@output_dir = output_dir
end
def build
Dir.glob("#{@content_dir}/**/*.md").each do |file|
content = File.read(file)
html = render_markdown(content)
output_path = file.sub(@content_dir, @output_dir).sub('.md', '.html')
FileUtils.mkdir_p(File.dirname(output_path))
File.write(output_path, html)
end
end
def render_markdown(content)
# Transform markdown to HTML with templates
end
end
SSG frameworks handle content in multiple formats including Markdown, YAML, JSON, and structured data files. They apply templating engines to generate HTML, process CSS and JavaScript assets, optimize images, and create complete deployable sites. The output directory contains only static files with no dependencies on specific server technology.
Key Principles
Static Site Generation separates content authoring from content delivery through a build-time transformation process. Content creators work with source files in formats like Markdown or data files. The build system transforms these sources into HTML during compilation, producing files that web servers deliver without modification.
Build-time vs Runtime Processing
The fundamental distinction between static and dynamic sites occurs at processing time. Dynamic sites execute code when users request pages, generating HTML from templates and data during each request. Static sites execute this code once during the build phase, storing the results as HTML files. Web servers deliver these pre-generated files directly without executing any application code.
This build-time processing creates an immutable output directory. Once built, the site content cannot change until the next build runs. Updates require rebuilding the entire site or incrementally building changed pages. The build output becomes a snapshot of the site at a specific point in time.
Content Transformation Pipeline
SSG systems implement a multi-stage pipeline that converts source content into deployable HTML:
- Content Loading: Read source files from the filesystem, parse frontmatter metadata, extract content bodies
- Data Processing: Load data files (YAML, JSON, CSV), fetch external data from APIs or databases during build
- Template Rendering: Apply layout templates using template engines, inject content into template placeholders
- Asset Processing: Compile Sass/SCSS to CSS, bundle and minify JavaScript, optimize and transform images
- Output Writing: Write HTML files to output directory, copy static assets, generate additional files like sitemaps
Each stage can access data from previous stages. Templates can reference data files, content can include processed assets, and output generation can use computed metadata.
Directory Structure Conventions
Static site generators adopt consistent directory structures that separate concerns:
project/
├── content/ # Source content (Markdown, data files)
├── layouts/ # HTML templates
├── assets/ # CSS, JS, images
├── data/ # Structured data (YAML, JSON)
└── output/ # Generated site (git-ignored)
The output directory mirrors the content structure. A file at content/blog/post.md generates output/blog/post.html (or output/blog/post/index.html for clean URLs). This predictable mapping allows direct correspondence between source files and generated URLs.
Incremental vs Full Rebuilds
Build strategies balance speed against correctness. Full rebuilds process all content files, guaranteeing output consistency but taking longer as sites grow. Incremental rebuilds process only changed files, improving speed but requiring dependency tracking to catch indirect changes.
A changed layout template affects all pages using that template. Modified data files impact pages that reference those data. Sophisticated generators track these dependencies to minimize unnecessary rebuilds while maintaining correctness.
Data Flow Architecture
Content flows through the system in a directed graph. Source files provide raw content. Data files supply structured information. Templates receive both content and data as variables. Helper functions transform data during rendering. The build process traverses this graph, resolving dependencies and generating output.
# Data flow in static generation
class SiteBuilder
attr_reader :content, :data, :config
def initialize(source_dir)
@source_dir = source_dir
@content = {}
@data = {}
@config = load_config
end
def build
load_data_files
load_content_files
render_pages
end
def load_data_files
Dir.glob("#{@source_dir}/_data/**/*.yml").each do |file|
key = File.basename(file, '.yml')
@data[key] = YAML.load_file(file)
end
end
def load_content_files
Dir.glob("#{@source_dir}/**/*.md").each do |file|
@content[file] = parse_content(file)
end
end
def render_pages
@content.each do |path, page|
html = render_template(page[:layout], page: page, data: @data)
write_output(path, html)
end
end
end
Ruby Implementation
Ruby hosts several mature static site generators with distinct design philosophies. Jekyll prioritizes simplicity and blog-focused workflows. Middleman targets application-like sites with complex asset pipelines. Nanoc provides maximum flexibility through explicit compilation rules.
Jekyll Site Structure
Jekyll follows convention over configuration, inferring behavior from directory structure:
# Jekyll plugin for custom content processing
module Jekyll
class CustomGenerator < Generator
safe true
priority :high
def generate(site)
site.pages.each do |page|
if page.data['custom_process']
page.content = process_content(page.content)
end
end
end
private
def process_content(content)
# Custom transformation logic
content.gsub(/\{\{(\w+)\}\}/) do |match|
fetch_dynamic_value($1)
end
end
end
end
Jekyll uses Liquid templating with built-in filters and tags. Templates access site-wide data through the site object and page-specific data through page variables:
# Custom Jekyll filter
module Jekyll
module CustomFilters
def excerpt(text, length = 150)
return text if text.length <= length
text[0...length].gsub(/\s+\S*$/, '') + '...'
end
def reading_time(text)
words_per_minute = 200
word_count = text.split.size
minutes = (word_count / words_per_minute.to_f).ceil
"#{minutes} min read"
end
end
end
Liquid::Template.register_filter(Jekyll::CustomFilters)
Middleman Application Structure
Middleman treats static sites as applications with explicit configuration:
# config.rb - Middleman configuration
activate :blog do |blog|
blog.prefix = "articles"
blog.permalink = "{year}/{month}/{title}.html"
blog.sources = "{year}-{month}-{day}-{title}.html"
blog.layout = "article"
end
configure :build do
activate :minify_css
activate :minify_javascript
activate :asset_hash
activate :relative_links
end
# Custom helpers for templates
helpers do
def article_summary(article, length = 250)
strip_tags(article.body).slice(0, length)
end
def format_date(date)
date.strftime("%B %e, %Y")
end
end
# Proxy pages for dynamic routes
data.products.each do |slug, product|
proxy "/products/#{slug}.html", "/templates/product.html",
locals: { product: product },
ignore: true
end
Middleman provides a development server with live reloading and asset compilation. The build process applies configured optimizations automatically.
Nanoc Compilation Rules
Nanoc uses explicit rules to define compilation behavior:
# Rules file - Nanoc compilation rules
compile '/articles/**/*' do
filter :kramdown
filter :colorize_syntax
layout '/article.*'
if item.identifier =~ '**/index.*'
write item.identifier.to_s
else
write item.identifier.without_ext + '/index.html'
end
end
compile '/assets/styles/**/*.scss' do
filter :sass, syntax: :scss, style: :compressed
write item.identifier.without_ext + '.css'
end
# Custom filter for content processing
class AddToc < Nanoc::Filter
identifier :add_toc
def run(content, params = {})
doc = Nokogiri::HTML(content)
toc = generate_toc(doc)
# Insert TOC after first heading
first_heading = doc.at_css('h2')
first_heading.add_next_sibling(toc) if first_heading
doc.to_html
end
private
def generate_toc(doc)
headings = doc.css('h2, h3')
# Build TOC structure from headings
end
end
Nanoc separates data loading, filtering, and layout application into distinct pipeline stages. Items flow through filters specified in rules, with each filter transforming content independently.
Content Processing Patterns
Ruby's text processing capabilities enable sophisticated content transformations:
# Advanced frontmatter parsing
class ContentParser
FRONTMATTER_REGEX = /\A---\s*\n(.*?\n)---\s*\n/m
def self.parse(raw_content)
if raw_content =~ FRONTMATTER_REGEX
frontmatter = YAML.load($1)
content = raw_content.sub(FRONTMATTER_REGEX, '')
# Process computed fields
frontmatter['word_count'] = content.split.size
frontmatter['excerpt'] ||= extract_excerpt(content)
{ frontmatter: frontmatter, content: content }
else
{ frontmatter: {}, content: raw_content }
end
end
def self.extract_excerpt(content)
# Find first paragraph
paragraphs = content.split(/\n\n+/)
strip_markdown(paragraphs.first)
end
def self.strip_markdown(text)
text.gsub(/[*_`\[\]()#]/, '').strip
end
end
Data Loading and Caching
Build-time data fetching requires caching to avoid repeated API calls:
# Data loader with caching
class DataLoader
def initialize(cache_dir = '.cache')
@cache_dir = cache_dir
FileUtils.mkdir_p(@cache_dir)
end
def fetch(key, ttl: 3600, &block)
cache_file = File.join(@cache_dir, "#{key}.cache")
if File.exist?(cache_file) &&
(Time.now - File.mtime(cache_file)) < ttl
return Marshal.load(File.read(cache_file))
end
data = block.call
File.write(cache_file, Marshal.dump(data))
data
end
end
# Usage in site builder
loader = DataLoader.new
github_data = loader.fetch('github_repos', ttl: 3600) do
# Expensive API call
fetch_github_repositories
end
Tools & Ecosystem
The Ruby ecosystem includes multiple static site generators with different design goals and feature sets. Jekyll dominates in popularity and GitHub integration. Middleman serves application-style sites with complex build requirements. Nanoc offers maximum control through programmatic configuration. Bridgetown modernizes Jekyll's architecture with improved performance.
Jekyll
Jekyll integrates tightly with GitHub Pages, providing free hosting for Jekyll sites pushed to GitHub repositories. The generator emphasizes simplicity with sensible defaults. Configuration happens through YAML files rather than Ruby code. The plugin ecosystem extends functionality without requiring generator modifications.
Jekyll organizes content through collections, allowing structured content beyond blog posts. Collections define custom content types with their own directories and output settings:
# _config.yml
collections:
products:
output: true
permalink: /products/:name/
team:
output: false
defaults:
- scope:
type: products
values:
layout: product
The generator processes Markdown with Kramdown by default, supporting extended syntax for tables, footnotes, and definition lists. Liquid templates provide logic and iteration without Ruby code execution.
Jekyll's incremental build mode tracks file dependencies to rebuild only affected pages. The development server watches for changes and regenerates modified content automatically. Production builds optimize output with minification and asset fingerprinting through plugins.
Middleman
Middleman structures sites as Ruby applications with a configuration file defining behavior. The framework includes an asset pipeline with Sprockets integration, automatic image optimization, and built-in support for modern frontend tools.
Extensions activate optional features. The blog extension adds blogging functionality. The asset hash extension fingerprints assets for cache invalidation. The minify extension compresses HTML, CSS, and JavaScript:
# Middleman extension example
class CustomExtension < Middleman::Extension
option :setting, 'default_value', 'Description'
def initialize(app, options_hash = {}, &block)
super
app.before_build do |builder|
# Run before build starts
prepare_build_environment
end
end
def manipulate_resource_list(resources)
# Modify resource list during compilation
resources.map do |resource|
if resource.path.end_with?('.html')
add_metadata(resource)
else
resource
end
end
end
end
Middleman::Extensions.register(:custom, CustomExtension)
Middleman supports dynamic pages through proxying. The configuration file creates pages programmatically from data files, enabling template reuse across similar pages:
# Generate pages from data
data.authors.each do |author_id, author|
proxy "/authors/#{author_id}.html",
"/templates/author.html",
locals: {
author: author,
posts: blog.articles.select { |a| a.data.author == author_id }
}
end
Nanoc
Nanoc provides complete control over compilation through explicit rules. The Rules file defines which items compile, which filters apply, and where output writes. This explicitness trades convenience for flexibility.
Filters transform content in a pipeline. Built-in filters handle Markdown, ERB, Haml, and Sass. Custom filters implement domain-specific transformations:
# Nanoc item representation
class Item
attr_reader :identifier, :content, :attributes
def initialize(content, attributes, identifier)
@content = content
@attributes = attributes
@identifier = identifier
end
def [](key)
@attributes[key]
end
end
# Compilation rule matching
compile '/blog/**/*.md' do
filter :kramdown, input: 'GFM'
filter :relativize_urls
layout '/blog_post.*'
write ext: 'html'
end
Nanoc separates data sources from compilation. Items load from filesystem directories by default, but custom data sources can load from databases, APIs, or other storage systems. This separation enables complex content workflows.
The generator provides dependency tracking at granular levels. Helper methods declare dependencies on items, attributes, or external files. Nanoc rebuilds dependent items when dependencies change:
# Dependency declaration
def articles_by_year
depend_on '/articles/**/*'
items = items.find_all('/articles/**/*.md')
items.group_by { |i| i[:published_at].year }
end
Bridgetown
Bridgetown forks Jekyll to modernize architecture and improve performance. The generator adds Webpack integration, component-based templating, and Ruby-based configuration. It maintains compatibility with many Jekyll plugins while introducing new features.
Bridgetown uses esbuild for asset bundling, replacing Jekyll's aging asset pipeline. Modern JavaScript workflows integrate naturally. The generator supports React, Vue, and Lit components within content:
# bridgetown.config.yml equivalent in Ruby
Bridgetown.configure do |config|
config.url = "https://example.com"
config.timezone = "America/New_York"
# Webpack configuration
config.webpack do |webpack|
webpack.entry = {
main: "./frontend/javascript/index.js"
}
end
end
# Resource extension
class AddExcerptTransform < Bridgetown::Resource::Transform
def transform
return unless resource.data.type == "post"
resource.data.excerpt ||= generate_excerpt(resource.content)
end
def generate_excerpt(content)
doc = Nokogiri::HTML(content)
doc.css('p').first&.text&.slice(0, 200)
end
end
Ecosystem Comparison
Different generators suit different use cases based on their design priorities:
Jekyll excels for documentation sites and blogs with straightforward requirements. GitHub Pages integration provides free hosting. The large plugin ecosystem covers common needs. Limited configuration options constrain complex use cases.
Middleman handles application-style sites with sophisticated asset requirements. The Ruby-based configuration enables programmatic site generation. The asset pipeline integrates modern frontend tools. Higher complexity requires more learning.
Nanoc provides maximum flexibility for complex content transformations. Explicit rules give complete control over compilation. Custom data sources enable unusual content workflows. The learning curve steepens without helpful defaults.
Bridgetown modernizes Jekyll for contemporary web development. Modern JavaScript tooling integrates seamlessly. Component-based development patterns work naturally. Smaller ecosystem means fewer ready-made plugins.
Design Considerations
Static Site Generation trades dynamic flexibility for performance and simplicity. The approach suits content that changes infrequently and requires no per-user customization. Understanding when SSG fits requires evaluating content update frequency, personalization needs, and deployment constraints.
Content Update Patterns
SSG works best when content updates happen on human timescales measured in hours or days rather than seconds. Blog posts, documentation, marketing pages, and project sites change infrequently enough that rebuild delays remain acceptable. News sites or social feeds requiring second-by-second updates fit poorly.
Build time grows with site size. Small sites with hundreds of pages rebuild in seconds. Large sites with thousands of pages may take minutes. Incremental builds reduce this time by processing only changed content, but complex dependency graphs limit optimization effectiveness.
Content updated by non-technical users requires additional tooling. Headless CMS systems provide editing interfaces that trigger rebuilds on save. Git-based workflows require comfort with version control. These tools add complexity compared to logging into a WordPress admin panel.
Personalization Requirements
Static sites deliver identical HTML to all users. Personalization requires client-side JavaScript loading user-specific data after page load. This two-phase approach works for basic customization like logged-in state or shopping cart contents. Complex personalization like recommendation engines or dynamic pricing fits poorly.
Authentication and authorization happen client-side through API calls. The static HTML contains no sensitive data. JavaScript fetches protected content from APIs after verifying credentials. This pattern separates public content (static) from private content (dynamic API).
Infrastructure Implications
Static sites deploy to any web server without special requirements. No application server, no database connections, no server-side runtime needed. CDNs can cache entire sites at edge locations worldwide. This simplicity reduces infrastructure costs and operational complexity.
Traditional hosting separates application servers (expensive, complex) from static file servers (cheap, simple). Static sites eliminate application servers entirely. A site serving 10,000 requests per second needs only CDN bandwidth, not server scaling.
The build process requires computational resources. Continuous deployment pipelines run builds on dedicated servers. Build time and frequency determine required capacity. Large sites may need powerful build servers despite simple runtime requirements.
Hybrid Approaches
Static sites can incorporate dynamic elements through client-side fetching. The initial HTML loads instantly from CDN. JavaScript then requests fresh data from APIs for dynamic sections. This pattern combines static performance with dynamic functionality.
# API endpoint for dynamic data
# Separate from static site
class CommentsAPI < Sinatra::Base
get '/comments/:page_id' do
content_type :json
Comment.where(page_id: params[:page_id]).to_json
end
post '/comments/:page_id' do
comment = Comment.create(
page_id: params[:page_id],
content: params[:content],
author: params[:author]
)
status 201
comment.to_json
end
end
The static site includes JavaScript loading comments client-side:
// Embedded in static page
fetch(`/api/comments/${pageId}`)
.then(response => response.json())
.then(comments => renderComments(comments));
This hybrid maintains static performance for content while adding dynamic features like comments, live data, or user interactions.
SSG vs Server-Side Rendering
Server-Side Rendering (SSR) generates HTML on-demand for each request. SSG generates HTML once during build. SSR handles dynamic content naturally but requires server infrastructure. SSG delivers better performance but updates require rebuilds.
SSR suits applications with per-user content, frequent updates, or complex data requirements. E-commerce sites with inventory updates, social feeds, or collaborative tools work better with SSR. Static sites suit content-focused sites with infrequent changes.
The rebuild cycle creates latency between content changes and published updates. Push-button rebuilds take minutes to propagate. Automatic rebuilds on content changes reduce but don't eliminate this delay. SSR reflects changes immediately.
SSG vs Client-Side Rendering
Single Page Applications (SPAs) render content entirely client-side. The server delivers minimal HTML and JavaScript bundle. Client code fetches data and renders views. SPAs provide app-like experiences but poor initial load performance and SEO challenges.
SSG delivers complete HTML on first request. Content appears immediately without JavaScript execution. Search engines index static HTML easily. SPAs require JavaScript execution to show content, complicating search indexing.
Static sites can adopt SPA patterns for sections requiring rich interaction. The initial page loads as static HTML. Client-side routing takes over for subsequent navigation. This progressive enhancement maintains static performance while enabling SPA features where needed.
Implementation Approaches
Building a static site generator requires solving content loading, template rendering, asset processing, and output generation. Different architectural approaches balance flexibility, performance, and maintainability.
Content Loading Strategies
Filesystem-based loading reads content from organized directories. The directory structure maps directly to URL structure. Content files contain frontmatter metadata and body content. This approach prioritizes simplicity and developer familiarity with file-based workflows.
# Filesystem-based content loader
class FileContentLoader
def initialize(content_dir)
@content_dir = content_dir
end
def load_all
Dir.glob("#{@content_dir}/**/*.{md,html}").map do |path|
load_file(path)
end
end
private
def load_file(path)
raw = File.read(path)
frontmatter, content = parse_frontmatter(raw)
{
path: path,
slug: generate_slug(path),
frontmatter: frontmatter,
content: content
}
end
def parse_frontmatter(raw)
if raw =~ /\A---\s*\n(.*?\n)---\s*\n/m
[YAML.load($1), raw.sub(/\A---\s*\n.*?\n---\s*\n/m, '')]
else
[{}, raw]
end
end
def generate_slug(path)
path.sub(@content_dir, '')
.sub(/\.(md|html)$/, '')
.sub(/\/$/, '/index')
end
end
Database-backed loading separates content storage from site structure. Content lives in databases queried during build. This enables complex filtering and relationships but requires database infrastructure for builds:
# Database-backed content loader
class DatabaseContentLoader
def initialize(database_url)
@db = Sequel.connect(database_url)
end
def load_all
@db[:posts]
.where(published: true)
.order(:published_at)
.map { |row| transform_row(row) }
end
private
def transform_row(row)
{
slug: row[:slug],
frontmatter: {
title: row[:title],
date: row[:published_at],
author: fetch_author(row[:author_id])
},
content: row[:content]
}
end
end
API-based loading fetches content from external services during build. Headless CMS platforms, content APIs, or custom services provide content. This centralizes content management across multiple sites:
# API-based content loader with caching
class APIContentLoader
def initialize(api_url, cache_dir = '.cache')
@api_url = api_url
@cache_dir = cache_dir
end
def load_all
cache_file = "#{@cache_dir}/content.json"
if File.exist?(cache_file) &&
(Time.now - File.mtime(cache_file)) < 300
return JSON.parse(File.read(cache_file))
end
response = HTTP.get("#{@api_url}/content")
content = JSON.parse(response.body)
File.write(cache_file, JSON.generate(content))
content
end
end
Template Rendering Approaches
Template rendering transforms content and data into HTML. Different engines balance power and safety. Liquid provides safe templating with sandboxed execution. ERB enables full Ruby but risks security issues with untrusted content. Template selection depends on trust levels and complexity needs.
Liquid restricts template capabilities to prevent arbitrary code execution:
# Liquid template rendering
require 'liquid'
template = Liquid::Template.parse(template_string)
output = template.render(
'page' => page_data,
'site' => site_data
)
ERB provides full Ruby access in templates:
# ERB template rendering
require 'erb'
template = ERB.new(template_string)
binding_context = TemplateBinding.new(page_data, site_data)
output = template.result(binding_context.get_binding)
class TemplateBinding
def initialize(page, site)
@page = page
@site = site
end
def get_binding
binding
end
end
Component-based rendering composes pages from reusable components:
# Component-based rendering
class Component
def initialize(props)
@props = props
end
def render
raise NotImplementedError
end
end
class ArticleCard < Component
def render
<<~HTML
<article>
<h2>#{@props[:title]}</h2>
<p>#{@props[:excerpt]}</p>
<a href="#{@props[:url]}">Read more</a>
</article>
HTML
end
end
# Usage
cards = articles.map { |a| ArticleCard.new(a).render }
Build Process Orchestration
Build systems coordinate content loading, transformation, and output generation. Sequential builds process files in order but waste time reprocessing unchanged content. Incremental builds track dependencies to minimize work:
# Incremental build system
class IncrementalBuilder
def initialize
@dependency_graph = DependencyGraph.new
@checksums = load_checksums
end
def build(content_files)
changed = content_files.select { |f| changed?(f) }
affected = @dependency_graph.find_affected(changed)
(changed + affected).uniq.each do |file|
process_file(file)
end
save_checksums
end
private
def changed?(file)
current_checksum = Digest::SHA256.file(file).hexdigest
@checksums[file] != current_checksum
end
def process_file(file)
# Transform and write output
update_dependencies(file)
end
end
Parallel builds process independent files concurrently:
# Parallel build processing
require 'concurrent'
class ParallelBuilder
def build(content_files)
pool = Concurrent::FixedThreadPool.new(4)
futures = content_files.map do |file|
Concurrent::Future.execute(executor: pool) do
process_file(file)
end
end
futures.each(&:value) # Wait for completion
pool.shutdown
end
end
Asset Pipeline Integration
Modern sites require asset processing for CSS compilation, JavaScript bundling, and image optimization. Integration strategies range from external tools to embedded pipelines.
External tool integration invokes separate build tools:
# External asset tool integration
class AssetBuilder
def build_assets
compile_sass
bundle_javascript
optimize_images
end
private
def compile_sass
system("sass assets/styles:output/css --style compressed")
end
def bundle_javascript
system("esbuild assets/js/main.js --bundle --minify --outfile=output/js/main.js")
end
def optimize_images
Dir.glob("assets/images/**/*.{jpg,png}").each do |image|
system("imageoptim #{image}")
end
end
end
Embedded pipelines process assets within the build system:
# Embedded asset pipeline
class EmbeddedAssetPipeline
def process_asset(asset_path)
case File.extname(asset_path)
when '.scss'
compile_scss(asset_path)
when '.js'
bundle_javascript(asset_path)
when '.jpg', '.png'
optimize_image(asset_path)
end
end
private
def compile_scss(path)
Sass::Engine.new(
File.read(path),
syntax: :scss,
style: :compressed
).render
end
end
Performance Considerations
Static sites achieve exceptional performance through pre-rendering and aggressive caching. Eliminating server-side processing reduces time to first byte. CDN distribution places content near users globally. Optimization focuses on build performance and runtime delivery.
Build Performance Optimization
Build time increases with site size. Large sites with thousands of pages require optimization to maintain reasonable build times. Incremental builds track file changes and rebuild only affected pages. Dependency graphs determine which pages depend on changed files.
A modified template affects all pages using that template. Changed data files impact pages referencing that data. Sophisticated tracking minimizes unnecessary rebuilds:
# Dependency tracking for incremental builds
class DependencyTracker
def initialize
@dependencies = Hash.new { |h, k| h[k] = Set.new }
@reverse_dependencies = Hash.new { |h, k| h[k] = Set.new }
end
def add_dependency(target, source)
@dependencies[target].add(source)
@reverse_dependencies[source].add(target)
end
def find_affected(changed_files)
affected = Set.new
queue = changed_files.dup
while file = queue.shift
affected.add(file)
dependents = @reverse_dependencies[file]
dependents.each do |dependent|
queue.push(dependent) unless affected.include?(dependent)
end
end
affected.to_a
end
end
Parallel processing builds multiple pages simultaneously. Ruby's threading limitations mean external processes work better for CPU-bound work:
# Parallel builds using processes
require 'parallel'
class ParallelPageBuilder
def build_pages(pages)
Parallel.map(pages, in_processes: 4) do |page|
build_page(page)
end
end
private
def build_page(page)
html = render_template(page)
write_output(page[:path], html)
end
end
Caching expensive operations avoids repeated work. Markdown rendering, syntax highlighting, and image processing cache results keyed by content hash:
# Operation caching
class CachedRenderer
def initialize(cache_dir = '.cache/render')
@cache_dir = cache_dir
FileUtils.mkdir_p(@cache_dir)
end
def render_markdown(content)
key = Digest::SHA256.hexdigest(content)
cache_file = "#{@cache_dir}/#{key}.html"
return File.read(cache_file) if File.exist?(cache_file)
html = Kramdown::Document.new(content).to_html
File.write(cache_file, html)
html
end
end
Runtime Delivery Performance
Static files enable aggressive HTTP caching. Immutable assets with fingerprinted filenames cache forever. HTML caches with appropriate TTLs balance freshness and performance. CDN edge caching serves content from locations near users.
Asset fingerprinting adds content hashes to filenames. Changed files get new names, bypassing stale caches:
# Asset fingerprinting
class AssetFingerprinter
def fingerprint_assets(output_dir)
assets = Dir.glob("#{output_dir}/**/*.{css,js,jpg,png}")
assets.each do |asset|
content = File.read(asset)
hash = Digest::SHA256.hexdigest(content)[0..7]
ext = File.extname(asset)
new_name = asset.sub(ext, "-#{hash}#{ext}")
File.rename(asset, new_name)
update_references(asset, new_name)
end
end
private
def update_references(old_path, new_path)
# Update HTML files referencing this asset
end
end
Critical CSS inlining embeds above-the-fold styles directly in HTML. Pages render immediately without waiting for external stylesheets:
# Critical CSS extraction
require 'nokogiri'
class CriticalCSSInliner
def inline_critical(html, critical_css)
doc = Nokogiri::HTML(html)
# Remove existing stylesheet link
doc.css('link[rel="stylesheet"]').first.remove
# Add inline critical CSS
style = Nokogiri::XML::Node.new('style', doc)
style.content = critical_css
doc.at_css('head').add_child(style)
doc.to_html
end
end
Image optimization reduces file sizes without visible quality loss. Responsive images serve appropriately sized versions based on device capabilities:
# Responsive image generation
require 'mini_magick'
class ResponsiveImages
SIZES = [320, 640, 1024, 1920].freeze
def generate_responsive(image_path)
image = MiniMagick::Image.open(image_path)
SIZES.map do |width|
next if image.width < width
resized = image.clone
resized.resize "#{width}x"
resized.strip # Remove EXIF data
output_path = image_path.sub(/\.(\w+)$/, "-#{width}w.\\1")
resized.write(output_path)
{ width: width, path: output_path }
end.compact
end
end
Build vs Runtime Performance Trade-offs
Build-time work improves runtime performance. Expensive processing during build produces optimized output served efficiently. Complex rendering, image transformation, and asset optimization happen once during build rather than repeatedly at runtime.
This trade-off has limits. Build times growing to hours make iterative development painful. Balancing build complexity against runtime benefits requires measuring both. Fast builds with adequate runtime performance beat slow builds with marginal runtime improvements.
Incremental deploys update only changed files. CDN purge patterns invalidate caches selectively. Deployment strategies affect how quickly changes reach users:
# Incremental deployment
class IncrementalDeployer
def deploy(output_dir, previous_manifest)
current_manifest = generate_manifest(output_dir)
changed = find_changed_files(current_manifest, previous_manifest)
upload_files(changed)
invalidate_cache(changed.keys)
save_manifest(current_manifest)
end
private
def generate_manifest(dir)
Dir.glob("#{dir}/**/*").each_with_object({}) do |file, manifest|
next if File.directory?(file)
manifest[file] = Digest::SHA256.file(file).hexdigest
end
end
def find_changed_files(current, previous)
current.select { |path, hash| previous[path] != hash }
end
end
Reference
Static Site Generator Comparison
| Generator | Primary Use Case | Configuration | Template Engine | Build Speed |
|---|---|---|---|---|
| Jekyll | Blogs, documentation | YAML | Liquid | Moderate |
| Middleman | Application sites | Ruby | ERB/Haml/Slim | Moderate |
| Nanoc | Complex content workflows | Ruby rules | Multiple | Fast |
| Bridgetown | Modern web apps | Ruby | Liquid/ERB | Fast |
Content Organization Patterns
| Pattern | Structure | Use Case |
|---|---|---|
| Flat | All content in single directory | Small sites |
| Hierarchical | Nested directories mirror URL structure | Documentation sites |
| Collection-based | Content types in separate directories | Multi-content-type sites |
| Date-based | YYYY/MM/DD directory structure | Blogs, news sites |
Build Process Stages
| Stage | Input | Output | Purpose |
|---|---|---|---|
| Content Loading | Source files | Parsed content objects | Read and parse source files |
| Data Loading | Data files, APIs | Data structures | Load external data |
| Template Rendering | Content, templates, data | HTML strings | Generate HTML from templates |
| Asset Processing | CSS, JS, images | Optimized assets | Compile and optimize assets |
| Output Writing | HTML, assets | File system | Write final output files |
Common File Extensions
| Extension | Purpose | Processing |
|---|---|---|
| .md | Markdown content | Markdown rendering |
| .html | HTML content | Template rendering |
| .erb | Embedded Ruby templates | ERB processing |
| .liquid | Liquid templates | Liquid rendering |
| .yml, .yaml | YAML data files | YAML parsing |
| .json | JSON data files | JSON parsing |
Template Variables
| Variable | Scope | Contains |
|---|---|---|
| site | Global | Site-wide configuration and data |
| page | Current page | Current page metadata and content |
| content | Current page | Rendered page content |
| layout | Current layout | Layout-specific metadata |
Deployment Strategies
| Strategy | Method | Considerations |
|---|---|---|
| CDN Push | Upload to CDN storage | Fast global delivery |
| Git-based | Push to GitHub/GitLab | Automatic builds |
| FTP/SFTP | Traditional file transfer | Legacy compatibility |
| Rsync | Incremental file sync | Efficient updates |
| Object Storage | Upload to S3/GCS | Scalable hosting |
Caching Headers
| Header | Value | Purpose |
|---|---|---|
| Cache-Control | max-age=31536000, immutable | Fingerprinted assets |
| Cache-Control | max-age=3600, must-revalidate | HTML pages |
| ETag | Content hash | Cache validation |
| Last-Modified | File timestamp | Conditional requests |
Build Optimization Techniques
| Technique | Benefit | Trade-off |
|---|---|---|
| Incremental builds | Faster rebuild times | Dependency tracking complexity |
| Parallel processing | Faster builds | Memory usage |
| Content caching | Skip re-rendering | Cache invalidation complexity |
| Asset fingerprinting | Aggressive caching | Build complexity |
| CDN caching | Fast delivery | Cache invalidation delays |
Jekyll Directory Structure
| Directory | Purpose | Output |
|---|---|---|
| _posts | Blog posts | Generated as pages |
| _drafts | Unpublished posts | Not generated |
| _layouts | HTML templates | Not directly output |
| _includes | Reusable snippets | Not directly output |
| _data | YAML/JSON data files | Available in templates |
| _site | Build output | Published files |
| assets | CSS, JS, images | Copied to output |
Middleman Helpers
| Helper | Purpose | Example |
|---|---|---|
| link_to | Generate links | Creates anchor tags |
| image_tag | Generate image tags | Creates img tags |
| stylesheet_link_tag | Include stylesheets | Links CSS files |
| javascript_include_tag | Include scripts | Links JS files |
| current_page | Access current page | Page metadata |
Common Build Errors
| Error | Cause | Solution |
|---|---|---|
| Template not found | Missing layout file | Create layout or update reference |
| Invalid frontmatter | YAML syntax error | Validate YAML syntax |
| Missing dependency | Gem not installed | Install required gem |
| Encoding error | Non-UTF8 characters | Fix file encoding |
| Memory exhaustion | Large site, insufficient RAM | Increase memory or optimize |