CrackedRuby - Static Site Generation

Overview

Static Site Generation (SSG) produces complete HTML pages during a build process rather than generating them dynamically for each request. The build system reads source content, applies templates, processes assets, and outputs a directory of static files ready for deployment to any web server or CDN.

SSG addresses the performance and security limitations of traditional dynamic websites. Dynamic sites execute server-side code for every request, querying databases and rendering templates repeatedly. Static sites eliminate this overhead by performing all processing once during the build phase. The resulting files contain no server-side code and require no database connections.

The approach originated in the early web when all sites were static HTML files created manually. As content management systems emerged, dynamic generation became dominant despite its computational costs. Modern SSG revived static generation by automating the build process while maintaining developer-friendly workflows with templates, data files, and content transformation.

# Basic static site generation concept
class StaticSiteGenerator
  def initialize(content_dir, output_dir)
    @content_dir = content_dir
    @output_dir = output_dir
  end
  
  def build
    Dir.glob("#{@content_dir}/**/*.md").each do |file|
      content = File.read(file)
      html = render_markdown(content)
      output_path = file.sub(@content_dir, @output_dir).sub('.md', '.html')
      
      FileUtils.mkdir_p(File.dirname(output_path))
      File.write(output_path, html)
    end
  end
  
  def render_markdown(content)
    # Transform markdown to HTML with templates
  end
end

SSG frameworks handle content in multiple formats including Markdown, YAML, JSON, and structured data files. They apply templating engines to generate HTML, process CSS and JavaScript assets, optimize images, and create complete deployable sites. The output directory contains only static files with no dependencies on specific server technology.

Key Principles

Static Site Generation separates content authoring from content delivery through a build-time transformation process. Content creators work with source files in formats like Markdown or data files. The build system transforms these sources into HTML during compilation, producing files that web servers deliver without modification.

Build-time vs Runtime Processing

The fundamental distinction between static and dynamic sites occurs at processing time. Dynamic sites execute code when users request pages, generating HTML from templates and data during each request. Static sites execute this code once during the build phase, storing the results as HTML files. Web servers deliver these pre-generated files directly without executing any application code.

This build-time processing creates an immutable output directory. Once built, the site content cannot change until the next build runs. Updates require rebuilding the entire site or incrementally building changed pages. The build output becomes a snapshot of the site at a specific point in time.

Content Transformation Pipeline

SSG systems implement a multi-stage pipeline that converts source content into deployable HTML:

Content Loading: Read source files from the filesystem, parse frontmatter metadata, extract content bodies
Data Processing: Load data files (YAML, JSON, CSV), fetch external data from APIs or databases during build
Template Rendering: Apply layout templates using template engines, inject content into template placeholders
Asset Processing: Compile Sass/SCSS to CSS, bundle and minify JavaScript, optimize and transform images
Output Writing: Write HTML files to output directory, copy static assets, generate additional files like sitemaps

Each stage can access data from previous stages. Templates can reference data files, content can include processed assets, and output generation can use computed metadata.

Directory Structure Conventions

Static site generators adopt consistent directory structures that separate concerns:

project/
├── content/          # Source content (Markdown, data files)
├── layouts/          # HTML templates
├── assets/           # CSS, JS, images
├── data/             # Structured data (YAML, JSON)
└── output/           # Generated site (git-ignored)

The output directory mirrors the content structure. A file at content/blog/post.md generates output/blog/post.html (or output/blog/post/index.html for clean URLs). This predictable mapping allows direct correspondence between source files and generated URLs.

Incremental vs Full Rebuilds

Build strategies balance speed against correctness. Full rebuilds process all content files, guaranteeing output consistency but taking longer as sites grow. Incremental rebuilds process only changed files, improving speed but requiring dependency tracking to catch indirect changes.

A changed layout template affects all pages using that template. Modified data files impact pages that reference those data. Sophisticated generators track these dependencies to minimize unnecessary rebuilds while maintaining correctness.

Data Flow Architecture

Content flows through the system in a directed graph. Source files provide raw content. Data files supply structured information. Templates receive both content and data as variables. Helper functions transform data during rendering. The build process traverses this graph, resolving dependencies and generating output.

# Data flow in static generation
class SiteBuilder
  attr_reader :content, :data, :config
  
  def initialize(source_dir)
    @source_dir = source_dir
    @content = {}
    @data = {}
    @config = load_config
  end
  
  def build
    load_data_files
    load_content_files
    render_pages
  end
  
  def load_data_files
    Dir.glob("#{@source_dir}/_data/**/*.yml").each do |file|
      key = File.basename(file, '.yml')
      @data[key] = YAML.load_file(file)
    end
  end
  
  def load_content_files
    Dir.glob("#{@source_dir}/**/*.md").each do |file|
      @content[file] = parse_content(file)
    end
  end
  
  def render_pages
    @content.each do |path, page|
      html = render_template(page[:layout], page: page, data: @data)
      write_output(path, html)
    end
  end
end

Ruby Implementation

Ruby hosts several mature static site generators with distinct design philosophies. Jekyll prioritizes simplicity and blog-focused workflows. Middleman targets application-like sites with complex asset pipelines. Nanoc provides maximum flexibility through explicit compilation rules.

Jekyll Site Structure

Jekyll follows convention over configuration, inferring behavior from directory structure:

# Jekyll plugin for custom content processing
module Jekyll
  class CustomGenerator < Generator
    safe true
    priority :high
    
    def generate(site)
      site.pages.each do |page|
        if page.data['custom_process']
          page.content = process_content(page.content)
        end
      end
    end
    
    private
    
    def process_content(content)
      # Custom transformation logic
      content.gsub(/\{\{(\w+)\}\}/) do |match|
        fetch_dynamic_value($1)
      end
    end
  end
end

Jekyll uses Liquid templating with built-in filters and tags. Templates access site-wide data through the site object and page-specific data through page variables:

# Custom Jekyll filter
module Jekyll
  module CustomFilters
    def excerpt(text, length = 150)
      return text if text.length <= length
      text[0...length].gsub(/\s+\S*$/, '') + '...'
    end
    
    def reading_time(text)
      words_per_minute = 200
      word_count = text.split.size
      minutes = (word_count / words_per_minute.to_f).ceil
      "#{minutes} min read"
    end
  end
end

Liquid::Template.register_filter(Jekyll::CustomFilters)

Middleman Application Structure

Middleman treats static sites as applications with explicit configuration:

# config.rb - Middleman configuration
activate :blog do |blog|
  blog.prefix = "articles"
  blog.permalink = "{year}/{month}/{title}.html"
  blog.sources = "{year}-{month}-{day}-{title}.html"
  blog.layout = "article"
end

configure :build do
  activate :minify_css
  activate :minify_javascript
  activate :asset_hash
  activate :relative_links
end

# Custom helpers for templates
helpers do
  def article_summary(article, length = 250)
    strip_tags(article.body).slice(0, length)
  end
  
  def format_date(date)
    date.strftime("%B %e, %Y")
  end
end

# Proxy pages for dynamic routes
data.products.each do |slug, product|
  proxy "/products/#{slug}.html", "/templates/product.html", 
    locals: { product: product },
    ignore: true
end

Middleman provides a development server with live reloading and asset compilation. The build process applies configured optimizations automatically.

Nanoc Compilation Rules

Nanoc uses explicit rules to define compilation behavior:

# Rules file - Nanoc compilation rules
compile '/articles/**/*' do
  filter :kramdown
  filter :colorize_syntax
  layout '/article.*'
  
  if item.identifier =~ '**/index.*'
    write item.identifier.to_s
  else
    write item.identifier.without_ext + '/index.html'
  end
end

compile '/assets/styles/**/*.scss' do
  filter :sass, syntax: :scss, style: :compressed
  write item.identifier.without_ext + '.css'
end

# Custom filter for content processing
class AddToc < Nanoc::Filter
  identifier :add_toc
  
  def run(content, params = {})
    doc = Nokogiri::HTML(content)
    toc = generate_toc(doc)
    
    # Insert TOC after first heading
    first_heading = doc.at_css('h2')
    first_heading.add_next_sibling(toc) if first_heading
    
    doc.to_html
  end
  
  private
  
  def generate_toc(doc)
    headings = doc.css('h2, h3')
    # Build TOC structure from headings
  end
end

Nanoc separates data loading, filtering, and layout application into distinct pipeline stages. Items flow through filters specified in rules, with each filter transforming content independently.

Content Processing Patterns

Ruby's text processing capabilities enable sophisticated content transformations:

# Advanced frontmatter parsing
class ContentParser
  FRONTMATTER_REGEX = /\A---\s*\n(.*?\n)---\s*\n/m
  
  def self.parse(raw_content)
    if raw_content =~ FRONTMATTER_REGEX
      frontmatter = YAML.load($1)
      content = raw_content.sub(FRONTMATTER_REGEX, '')
      
      # Process computed fields
      frontmatter['word_count'] = content.split.size
      frontmatter['excerpt'] ||= extract_excerpt(content)
      
      { frontmatter: frontmatter, content: content }
    else
      { frontmatter: {}, content: raw_content }
    end
  end
  
  def self.extract_excerpt(content)
    # Find first paragraph
    paragraphs = content.split(/\n\n+/)
    strip_markdown(paragraphs.first)
  end
  
  def self.strip_markdown(text)
    text.gsub(/[*_`\[\]()#]/, '').strip
  end
end

Data Loading and Caching

Build-time data fetching requires caching to avoid repeated API calls:

# Data loader with caching
class DataLoader
  def initialize(cache_dir = '.cache')
    @cache_dir = cache_dir
    FileUtils.mkdir_p(@cache_dir)
  end
  
  def fetch(key, ttl: 3600, &block)
    cache_file = File.join(@cache_dir, "#{key}.cache")
    
    if File.exist?(cache_file) && 
       (Time.now - File.mtime(cache_file)) < ttl
      return Marshal.load(File.read(cache_file))
    end
    
    data = block.call
    File.write(cache_file, Marshal.dump(data))
    data
  end
end

# Usage in site builder
loader = DataLoader.new
github_data = loader.fetch('github_repos', ttl: 3600) do
  # Expensive API call
  fetch_github_repositories
end

Tools & Ecosystem

The Ruby ecosystem includes multiple static site generators with different design goals and feature sets. Jekyll dominates in popularity and GitHub integration. Middleman serves application-style sites with complex build requirements. Nanoc offers maximum control through programmatic configuration. Bridgetown modernizes Jekyll's architecture with improved performance.

Jekyll

Jekyll integrates tightly with GitHub Pages, providing free hosting for Jekyll sites pushed to GitHub repositories. The generator emphasizes simplicity with sensible defaults. Configuration happens through YAML files rather than Ruby code. The plugin ecosystem extends functionality without requiring generator modifications.

Jekyll organizes content through collections, allowing structured content beyond blog posts. Collections define custom content types with their own directories and output settings:

# _config.yml
collections:
  products:
    output: true
    permalink: /products/:name/
  team:
    output: false

defaults:
  - scope:
      type: products
    values:
      layout: product

The generator processes Markdown with Kramdown by default, supporting extended syntax for tables, footnotes, and definition lists. Liquid templates provide logic and iteration without Ruby code execution.

Jekyll's incremental build mode tracks file dependencies to rebuild only affected pages. The development server watches for changes and regenerates modified content automatically. Production builds optimize output with minification and asset fingerprinting through plugins.

Middleman

Middleman structures sites as Ruby applications with a configuration file defining behavior. The framework includes an asset pipeline with Sprockets integration, automatic image optimization, and built-in support for modern frontend tools.

Extensions activate optional features. The blog extension adds blogging functionality. The asset hash extension fingerprints assets for cache invalidation. The minify extension compresses HTML, CSS, and JavaScript:

# Middleman extension example
class CustomExtension < Middleman::Extension
  option :setting, 'default_value', 'Description'
  
  def initialize(app, options_hash = {}, &block)
    super
    
    app.before_build do |builder|
      # Run before build starts
      prepare_build_environment
    end
  end
  
  def manipulate_resource_list(resources)
    # Modify resource list during compilation
    resources.map do |resource|
      if resource.path.end_with?('.html')
        add_metadata(resource)
      else
        resource
      end
    end
  end
end

Middleman::Extensions.register(:custom, CustomExtension)

Middleman supports dynamic pages through proxying. The configuration file creates pages programmatically from data files, enabling template reuse across similar pages:

# Generate pages from data
data.authors.each do |author_id, author|
  proxy "/authors/#{author_id}.html", 
        "/templates/author.html",
        locals: { 
          author: author,
          posts: blog.articles.select { |a| a.data.author == author_id }
        }
end

Nanoc

Nanoc provides complete control over compilation through explicit rules. The Rules file defines which items compile, which filters apply, and where output writes. This explicitness trades convenience for flexibility.

Filters transform content in a pipeline. Built-in filters handle Markdown, ERB, Haml, and Sass. Custom filters implement domain-specific transformations:

# Nanoc item representation
class Item
  attr_reader :identifier, :content, :attributes
  
  def initialize(content, attributes, identifier)
    @content = content
    @attributes = attributes
    @identifier = identifier
  end
  
  def [](key)
    @attributes[key]
  end
end

# Compilation rule matching
compile '/blog/**/*.md' do
  filter :kramdown, input: 'GFM'
  filter :relativize_urls
  layout '/blog_post.*'
  write ext: 'html'
end

Nanoc separates data sources from compilation. Items load from filesystem directories by default, but custom data sources can load from databases, APIs, or other storage systems. This separation enables complex content workflows.

The generator provides dependency tracking at granular levels. Helper methods declare dependencies on items, attributes, or external files. Nanoc rebuilds dependent items when dependencies change:

# Dependency declaration
def articles_by_year
  depend_on '/articles/**/*'
  
  items = items.find_all('/articles/**/*.md')
  items.group_by { |i| i[:published_at].year }
end

Bridgetown

Bridgetown forks Jekyll to modernize architecture and improve performance. The generator adds Webpack integration, component-based templating, and Ruby-based configuration. It maintains compatibility with many Jekyll plugins while introducing new features.

Bridgetown uses esbuild for asset bundling, replacing Jekyll's aging asset pipeline. Modern JavaScript workflows integrate naturally. The generator supports React, Vue, and Lit components within content:

# bridgetown.config.yml equivalent in Ruby
Bridgetown.configure do |config|
  config.url = "https://example.com"
  config.timezone = "America/New_York"
  
  # Webpack configuration
  config.webpack do |webpack|
    webpack.entry = {
      main: "./frontend/javascript/index.js"
    }
  end
end

# Resource extension
class AddExcerptTransform < Bridgetown::Resource::Transform
  def transform
    return unless resource.data.type == "post"
    
    resource.data.excerpt ||= generate_excerpt(resource.content)
  end
  
  def generate_excerpt(content)
    doc = Nokogiri::HTML(content)
    doc.css('p').first&.text&.slice(0, 200)
  end
end

Ecosystem Comparison

Different generators suit different use cases based on their design priorities:

Jekyll excels for documentation sites and blogs with straightforward requirements. GitHub Pages integration provides free hosting. The large plugin ecosystem covers common needs. Limited configuration options constrain complex use cases.

Middleman handles application-style sites with sophisticated asset requirements. The Ruby-based configuration enables programmatic site generation. The asset pipeline integrates modern frontend tools. Higher complexity requires more learning.

Nanoc provides maximum flexibility for complex content transformations. Explicit rules give complete control over compilation. Custom data sources enable unusual content workflows. The learning curve steepens without helpful defaults.

Bridgetown modernizes Jekyll for contemporary web development. Modern JavaScript tooling integrates seamlessly. Component-based development patterns work naturally. Smaller ecosystem means fewer ready-made plugins.

Design Considerations

Static Site Generation trades dynamic flexibility for performance and simplicity. The approach suits content that changes infrequently and requires no per-user customization. Understanding when SSG fits requires evaluating content update frequency, personalization needs, and deployment constraints.

Content Update Patterns

SSG works best when content updates happen on human timescales measured in hours or days rather than seconds. Blog posts, documentation, marketing pages, and project sites change infrequently enough that rebuild delays remain acceptable. News sites or social feeds requiring second-by-second updates fit poorly.

Build time grows with site size. Small sites with hundreds of pages rebuild in seconds. Large sites with thousands of pages may take minutes. Incremental builds reduce this time by processing only changed content, but complex dependency graphs limit optimization effectiveness.

Content updated by non-technical users requires additional tooling. Headless CMS systems provide editing interfaces that trigger rebuilds on save. Git-based workflows require comfort with version control. These tools add complexity compared to logging into a WordPress admin panel.

Personalization Requirements

Static sites deliver identical HTML to all users. Personalization requires client-side JavaScript loading user-specific data after page load. This two-phase approach works for basic customization like logged-in state or shopping cart contents. Complex personalization like recommendation engines or dynamic pricing fits poorly.

Authentication and authorization happen client-side through API calls. The static HTML contains no sensitive data. JavaScript fetches protected content from APIs after verifying credentials. This pattern separates public content (static) from private content (dynamic API).

Infrastructure Implications

Static sites deploy to any web server without special requirements. No application server, no database connections, no server-side runtime needed. CDNs can cache entire sites at edge locations worldwide. This simplicity reduces infrastructure costs and operational complexity.

Traditional hosting separates application servers (expensive, complex) from static file servers (cheap, simple). Static sites eliminate application servers entirely. A site serving 10,000 requests per second needs only CDN bandwidth, not server scaling.

The build process requires computational resources. Continuous deployment pipelines run builds on dedicated servers. Build time and frequency determine required capacity. Large sites may need powerful build servers despite simple runtime requirements.

Hybrid Approaches

Static sites can incorporate dynamic elements through client-side fetching. The initial HTML loads instantly from CDN. JavaScript then requests fresh data from APIs for dynamic sections. This pattern combines static performance with dynamic functionality.

# API endpoint for dynamic data
# Separate from static site
class CommentsAPI < Sinatra::Base
  get '/comments/:page_id' do
    content_type :json
    Comment.where(page_id: params[:page_id]).to_json
  end
  
  post '/comments/:page_id' do
    comment = Comment.create(
      page_id: params[:page_id],
      content: params[:content],
      author: params[:author]
    )
    status 201
    comment.to_json
  end
end

The static site includes JavaScript loading comments client-side:

// Embedded in static page
fetch(`/api/comments/${pageId}`)
  .then(response => response.json())
  .then(comments => renderComments(comments));

This hybrid maintains static performance for content while adding dynamic features like comments, live data, or user interactions.

SSG vs Server-Side Rendering

Server-Side Rendering (SSR) generates HTML on-demand for each request. SSG generates HTML once during build. SSR handles dynamic content naturally but requires server infrastructure. SSG delivers better performance but updates require rebuilds.

SSR suits applications with per-user content, frequent updates, or complex data requirements. E-commerce sites with inventory updates, social feeds, or collaborative tools work better with SSR. Static sites suit content-focused sites with infrequent changes.

The rebuild cycle creates latency between content changes and published updates. Push-button rebuilds take minutes to propagate. Automatic rebuilds on content changes reduce but don't eliminate this delay. SSR reflects changes immediately.

SSG vs Client-Side Rendering

Single Page Applications (SPAs) render content entirely client-side. The server delivers minimal HTML and JavaScript bundle. Client code fetches data and renders views. SPAs provide app-like experiences but poor initial load performance and SEO challenges.

SSG delivers complete HTML on first request. Content appears immediately without JavaScript execution. Search engines index static HTML easily. SPAs require JavaScript execution to show content, complicating search indexing.

Static sites can adopt SPA patterns for sections requiring rich interaction. The initial page loads as static HTML. Client-side routing takes over for subsequent navigation. This progressive enhancement maintains static performance while enabling SPA features where needed.

Implementation Approaches

Building a static site generator requires solving content loading, template rendering, asset processing, and output generation. Different architectural approaches balance flexibility, performance, and maintainability.

Content Loading Strategies

Filesystem-based loading reads content from organized directories. The directory structure maps directly to URL structure. Content files contain frontmatter metadata and body content. This approach prioritizes simplicity and developer familiarity with file-based workflows.

# Filesystem-based content loader
class FileContentLoader
  def initialize(content_dir)
    @content_dir = content_dir
  end
  
  def load_all
    Dir.glob("#{@content_dir}/**/*.{md,html}").map do |path|
      load_file(path)
    end
  end
  
  private
  
  def load_file(path)
    raw = File.read(path)
    frontmatter, content = parse_frontmatter(raw)
    
    {
      path: path,
      slug: generate_slug(path),
      frontmatter: frontmatter,
      content: content
    }
  end
  
  def parse_frontmatter(raw)
    if raw =~ /\A---\s*\n(.*?\n)---\s*\n/m
      [YAML.load($1), raw.sub(/\A---\s*\n.*?\n---\s*\n/m, '')]
    else
      [{}, raw]
    end
  end
  
  def generate_slug(path)
    path.sub(@content_dir, '')
        .sub(/\.(md|html)$/, '')
        .sub(/\/$/, '/index')
  end
end

Database-backed loading separates content storage from site structure. Content lives in databases queried during build. This enables complex filtering and relationships but requires database infrastructure for builds:

# Database-backed content loader
class DatabaseContentLoader
  def initialize(database_url)
    @db = Sequel.connect(database_url)
  end
  
  def load_all
    @db[:posts]
      .where(published: true)
      .order(:published_at)
      .map { |row| transform_row(row) }
  end
  
  private
  
  def transform_row(row)
    {
      slug: row[:slug],
      frontmatter: {
        title: row[:title],
        date: row[:published_at],
        author: fetch_author(row[:author_id])
      },
      content: row[:content]
    }
  end
end

API-based loading fetches content from external services during build. Headless CMS platforms, content APIs, or custom services provide content. This centralizes content management across multiple sites:

# API-based content loader with caching
class APIContentLoader
  def initialize(api_url, cache_dir = '.cache')
    @api_url = api_url
    @cache_dir = cache_dir
  end
  
  def load_all
    cache_file = "#{@cache_dir}/content.json"
    
    if File.exist?(cache_file) && 
       (Time.now - File.mtime(cache_file)) < 300
      return JSON.parse(File.read(cache_file))
    end
    
    response = HTTP.get("#{@api_url}/content")
    content = JSON.parse(response.body)
    
    File.write(cache_file, JSON.generate(content))
    content
  end
end

Template Rendering Approaches

Template rendering transforms content and data into HTML. Different engines balance power and safety. Liquid provides safe templating with sandboxed execution. ERB enables full Ruby but risks security issues with untrusted content. Template selection depends on trust levels and complexity needs.

Liquid restricts template capabilities to prevent arbitrary code execution:

# Liquid template rendering
require 'liquid'

template = Liquid::Template.parse(template_string)
output = template.render(
  'page' => page_data,
  'site' => site_data
)

ERB provides full Ruby access in templates:

# ERB template rendering
require 'erb'

template = ERB.new(template_string)
binding_context = TemplateBinding.new(page_data, site_data)
output = template.result(binding_context.get_binding)

class TemplateBinding
  def initialize(page, site)
    @page = page
    @site = site
  end
  
  def get_binding
    binding
  end
end

Component-based rendering composes pages from reusable components:

# Component-based rendering
class Component
  def initialize(props)
    @props = props
  end
  
  def render
    raise NotImplementedError
  end
end

class ArticleCard < Component
  def render
    <<~HTML
      <article>
        <h2>#{@props[:title]}</h2>
        <p>#{@props[:excerpt]}</p>
        <a href="#{@props[:url]}">Read more</a>
      </article>
    HTML
  end
end

# Usage
cards = articles.map { |a| ArticleCard.new(a).render }

Build Process Orchestration

Build systems coordinate content loading, transformation, and output generation. Sequential builds process files in order but waste time reprocessing unchanged content. Incremental builds track dependencies to minimize work:

# Incremental build system
class IncrementalBuilder
  def initialize
    @dependency_graph = DependencyGraph.new
    @checksums = load_checksums
  end
  
  def build(content_files)
    changed = content_files.select { |f| changed?(f) }
    affected = @dependency_graph.find_affected(changed)
    
    (changed + affected).uniq.each do |file|
      process_file(file)
    end
    
    save_checksums
  end
  
  private
  
  def changed?(file)
    current_checksum = Digest::SHA256.file(file).hexdigest
    @checksums[file] != current_checksum
  end
  
  def process_file(file)
    # Transform and write output
    update_dependencies(file)
  end
end

Parallel builds process independent files concurrently:

# Parallel build processing
require 'concurrent'

class ParallelBuilder
  def build(content_files)
    pool = Concurrent::FixedThreadPool.new(4)
    futures = content_files.map do |file|
      Concurrent::Future.execute(executor: pool) do
        process_file(file)
      end
    end
    
    futures.each(&:value) # Wait for completion
    pool.shutdown
  end
end

Asset Pipeline Integration

Modern sites require asset processing for CSS compilation, JavaScript bundling, and image optimization. Integration strategies range from external tools to embedded pipelines.

External tool integration invokes separate build tools:

# External asset tool integration
class AssetBuilder
  def build_assets
    compile_sass
    bundle_javascript
    optimize_images
  end
  
  private
  
  def compile_sass
    system("sass assets/styles:output/css --style compressed")
  end
  
  def bundle_javascript
    system("esbuild assets/js/main.js --bundle --minify --outfile=output/js/main.js")
  end
  
  def optimize_images
    Dir.glob("assets/images/**/*.{jpg,png}").each do |image|
      system("imageoptim #{image}")
    end
  end
end

Embedded pipelines process assets within the build system:

# Embedded asset pipeline
class EmbeddedAssetPipeline
  def process_asset(asset_path)
    case File.extname(asset_path)
    when '.scss'
      compile_scss(asset_path)
    when '.js'
      bundle_javascript(asset_path)
    when '.jpg', '.png'
      optimize_image(asset_path)
    end
  end
  
  private
  
  def compile_scss(path)
    Sass::Engine.new(
      File.read(path),
      syntax: :scss,
      style: :compressed
    ).render
  end
end

Performance Considerations

Static sites achieve exceptional performance through pre-rendering and aggressive caching. Eliminating server-side processing reduces time to first byte. CDN distribution places content near users globally. Optimization focuses on build performance and runtime delivery.

Build Performance Optimization

Build time increases with site size. Large sites with thousands of pages require optimization to maintain reasonable build times. Incremental builds track file changes and rebuild only affected pages. Dependency graphs determine which pages depend on changed files.

A modified template affects all pages using that template. Changed data files impact pages referencing that data. Sophisticated tracking minimizes unnecessary rebuilds:

# Dependency tracking for incremental builds
class DependencyTracker
  def initialize
    @dependencies = Hash.new { |h, k| h[k] = Set.new }
    @reverse_dependencies = Hash.new { |h, k| h[k] = Set.new }
  end
  
  def add_dependency(target, source)
    @dependencies[target].add(source)
    @reverse_dependencies[source].add(target)
  end
  
  def find_affected(changed_files)
    affected = Set.new
    queue = changed_files.dup
    
    while file = queue.shift
      affected.add(file)
      dependents = @reverse_dependencies[file]
      dependents.each do |dependent|
        queue.push(dependent) unless affected.include?(dependent)
      end
    end
    
    affected.to_a
  end
end

Parallel processing builds multiple pages simultaneously. Ruby's threading limitations mean external processes work better for CPU-bound work:

# Parallel builds using processes
require 'parallel'

class ParallelPageBuilder
  def build_pages(pages)
    Parallel.map(pages, in_processes: 4) do |page|
      build_page(page)
    end
  end
  
  private
  
  def build_page(page)
    html = render_template(page)
    write_output(page[:path], html)
  end
end

Caching expensive operations avoids repeated work. Markdown rendering, syntax highlighting, and image processing cache results keyed by content hash:

# Operation caching
class CachedRenderer
  def initialize(cache_dir = '.cache/render')
    @cache_dir = cache_dir
    FileUtils.mkdir_p(@cache_dir)
  end
  
  def render_markdown(content)
    key = Digest::SHA256.hexdigest(content)
    cache_file = "#{@cache_dir}/#{key}.html"
    
    return File.read(cache_file) if File.exist?(cache_file)
    
    html = Kramdown::Document.new(content).to_html
    File.write(cache_file, html)
    html
  end
end

Runtime Delivery Performance

Static files enable aggressive HTTP caching. Immutable assets with fingerprinted filenames cache forever. HTML caches with appropriate TTLs balance freshness and performance. CDN edge caching serves content from locations near users.

Asset fingerprinting adds content hashes to filenames. Changed files get new names, bypassing stale caches:

# Asset fingerprinting
class AssetFingerprinter
  def fingerprint_assets(output_dir)
    assets = Dir.glob("#{output_dir}/**/*.{css,js,jpg,png}")
    
    assets.each do |asset|
      content = File.read(asset)
      hash = Digest::SHA256.hexdigest(content)[0..7]
      
      ext = File.extname(asset)
      new_name = asset.sub(ext, "-#{hash}#{ext}")
      
      File.rename(asset, new_name)
      update_references(asset, new_name)
    end
  end
  
  private
  
  def update_references(old_path, new_path)
    # Update HTML files referencing this asset
  end
end

Critical CSS inlining embeds above-the-fold styles directly in HTML. Pages render immediately without waiting for external stylesheets:

# Critical CSS extraction
require 'nokogiri'

class CriticalCSSInliner
  def inline_critical(html, critical_css)
    doc = Nokogiri::HTML(html)
    
    # Remove existing stylesheet link
    doc.css('link[rel="stylesheet"]').first.remove
    
    # Add inline critical CSS
    style = Nokogiri::XML::Node.new('style', doc)
    style.content = critical_css
    doc.at_css('head').add_child(style)
    
    doc.to_html
  end
end

Image optimization reduces file sizes without visible quality loss. Responsive images serve appropriately sized versions based on device capabilities:

# Responsive image generation
require 'mini_magick'

class ResponsiveImages
  SIZES = [320, 640, 1024, 1920].freeze
  
  def generate_responsive(image_path)
    image = MiniMagick::Image.open(image_path)
    
    SIZES.map do |width|
      next if image.width < width
      
      resized = image.clone
      resized.resize "#{width}x"
      resized.strip # Remove EXIF data
      
      output_path = image_path.sub(/\.(\w+)$/, "-#{width}w.\\1")
      resized.write(output_path)
      
      { width: width, path: output_path }
    end.compact
  end
end

Build vs Runtime Performance Trade-offs

Build-time work improves runtime performance. Expensive processing during build produces optimized output served efficiently. Complex rendering, image transformation, and asset optimization happen once during build rather than repeatedly at runtime.

This trade-off has limits. Build times growing to hours make iterative development painful. Balancing build complexity against runtime benefits requires measuring both. Fast builds with adequate runtime performance beat slow builds with marginal runtime improvements.

Incremental deploys update only changed files. CDN purge patterns invalidate caches selectively. Deployment strategies affect how quickly changes reach users:

# Incremental deployment
class IncrementalDeployer
  def deploy(output_dir, previous_manifest)
    current_manifest = generate_manifest(output_dir)
    changed = find_changed_files(current_manifest, previous_manifest)
    
    upload_files(changed)
    invalidate_cache(changed.keys)
    
    save_manifest(current_manifest)
  end
  
  private
  
  def generate_manifest(dir)
    Dir.glob("#{dir}/**/*").each_with_object({}) do |file, manifest|
      next if File.directory?(file)
      manifest[file] = Digest::SHA256.file(file).hexdigest
    end
  end
  
  def find_changed_files(current, previous)
    current.select { |path, hash| previous[path] != hash }
  end
end

Reference

Static Site Generator Comparison

Generator	Primary Use Case	Configuration	Template Engine	Build Speed
Jekyll	Blogs, documentation	YAML	Liquid	Moderate
Middleman	Application sites	Ruby	ERB/Haml/Slim	Moderate
Nanoc	Complex content workflows	Ruby rules	Multiple	Fast
Bridgetown	Modern web apps	Ruby	Liquid/ERB	Fast

Content Organization Patterns

Pattern	Structure	Use Case
Flat	All content in single directory	Small sites
Hierarchical	Nested directories mirror URL structure	Documentation sites
Collection-based	Content types in separate directories	Multi-content-type sites
Date-based	YYYY/MM/DD directory structure	Blogs, news sites

Build Process Stages

Stage	Input	Output	Purpose
Content Loading	Source files	Parsed content objects	Read and parse source files
Data Loading	Data files, APIs	Data structures	Load external data
Template Rendering	Content, templates, data	HTML strings	Generate HTML from templates
Asset Processing	CSS, JS, images	Optimized assets	Compile and optimize assets
Output Writing	HTML, assets	File system	Write final output files

Common File Extensions

Extension	Purpose	Processing
.md	Markdown content	Markdown rendering
.html	HTML content	Template rendering
.erb	Embedded Ruby templates	ERB processing
.liquid	Liquid templates	Liquid rendering
.yml, .yaml	YAML data files	YAML parsing
.json	JSON data files	JSON parsing

Template Variables

Variable	Scope	Contains
site	Global	Site-wide configuration and data
page	Current page	Current page metadata and content
content	Current page	Rendered page content
layout	Current layout	Layout-specific metadata

Deployment Strategies

Strategy	Method	Considerations
CDN Push	Upload to CDN storage	Fast global delivery
Git-based	Push to GitHub/GitLab	Automatic builds
FTP/SFTP	Traditional file transfer	Legacy compatibility
Rsync	Incremental file sync	Efficient updates
Object Storage	Upload to S3/GCS	Scalable hosting

Caching Headers

Header	Value	Purpose
Cache-Control	max-age=31536000, immutable	Fingerprinted assets
Cache-Control	max-age=3600, must-revalidate	HTML pages
ETag	Content hash	Cache validation
Last-Modified	File timestamp	Conditional requests

Build Optimization Techniques

Technique	Benefit	Trade-off
Incremental builds	Faster rebuild times	Dependency tracking complexity
Parallel processing	Faster builds	Memory usage
Content caching	Skip re-rendering	Cache invalidation complexity
Asset fingerprinting	Aggressive caching	Build complexity
CDN caching	Fast delivery	Cache invalidation delays

Jekyll Directory Structure

Directory	Purpose	Output
_posts	Blog posts	Generated as pages
_drafts	Unpublished posts	Not generated
_layouts	HTML templates	Not directly output
_includes	Reusable snippets	Not directly output
_data	YAML/JSON data files	Available in templates
_site	Build output	Published files
assets	CSS, JS, images	Copied to output

Middleman Helpers

Helper	Purpose	Example
link_to	Generate links	Creates anchor tags
image_tag	Generate image tags	Creates img tags
stylesheet_link_tag	Include stylesheets	Links CSS files
javascript_include_tag	Include scripts	Links JS files
current_page	Access current page	Page metadata

Common Build Errors

Error	Cause	Solution
Template not found	Missing layout file	Create layout or update reference
Invalid frontmatter	YAML syntax error	Validate YAML syntax
Missing dependency	Gem not installed	Install required gem
Encoding error	Non-UTF8 characters	Fix file encoding
Memory exhaustion	Large site, insufficient RAM	Increase memory or optimize

Static Site Generation