CrackedRuby - Code Metrics

Overview

Code metrics provide objective, numerical measurements of software code characteristics. These measurements transform subjective assessments of code quality into quantifiable data that teams can track, analyze, and improve over time. Metrics range from simple counts like lines of code to complex calculations like cyclomatic complexity and maintainability indices.

The practice of measuring code emerged from software engineering research in the 1970s when organizations needed systematic approaches to evaluate software quality. Early metrics focused on size and complexity measurements. Modern code metrics encompass structural properties, test coverage, duplication, documentation completeness, and technical debt indicators.

Code metrics serve multiple purposes in software development. They identify problem areas requiring refactoring, track quality trends across releases, enforce coding standards during code review, inform architectural decisions, and provide data for project planning and estimation. Metrics become particularly valuable when measured consistently over time, revealing patterns and trends that spot-checks cannot detect.

Different metrics measure different aspects of code quality. Complexity metrics assess how difficult code is to understand and maintain. Coverage metrics measure test thoroughness. Duplication metrics identify redundant code. Coupling and cohesion metrics evaluate architectural quality. No single metric provides complete insight—teams use combinations of metrics to build comprehensive quality pictures.

# Simple metric: counting method lines
def calculate_metrics(file_path)
  lines = File.readlines(file_path)
  code_lines = lines.reject { |line| line.strip.empty? || line.strip.start_with?('#') }
  
  {
    total_lines: lines.count,
    code_lines: code_lines.count,
    comment_lines: lines.count - code_lines.count - blank_line_count(lines)
  }
end

Code metrics exist at multiple granularity levels. Method-level metrics measure individual function complexity and size. Class-level metrics assess object design quality through coupling and cohesion. Module-level metrics evaluate component organization. System-level metrics aggregate data across entire codebases. Each level provides different insights appropriate for different decisions.

Key Principles

Quantification transforms subjective assessment into objective measurement. Rather than describing code as "complex" or "maintainable," metrics assign numerical values that enable comparison and tracking. This quantification supports data-driven decision making and removes ambiguity from quality discussions.

Different metrics measure orthogonal quality dimensions. Cyclomatic complexity measures branching logic. Lines of code measures size. Code coverage measures test thoroughness. Coupling measures dependencies. Each metric captures a distinct aspect of code quality. High scores in one dimension do not guarantee quality in others—a small function can have high complexity; well-tested code can have poor design.

Thresholds distinguish acceptable from problematic code. Most metrics become actionable through threshold values that trigger attention or enforcement. Cyclomatic complexity above 10 suggests refactoring opportunities. Test coverage below 80% indicates insufficient testing. Duplication above 5% signals maintainability risks. These thresholds vary by team, domain, and context but provide clear quality gates.

Trends matter more than absolute values. A single metric snapshot provides limited insight. Tracking metrics over time reveals whether quality improves, degrades, or remains stable. Increasing complexity over sprints signals accumulating technical debt. Declining coverage indicates growing test gaps. Stable metrics suggest consistent quality practices.

Context determines metric interpretation. A cyclomatic complexity of 15 might be acceptable in complex domain logic but problematic in utility functions. 60% test coverage might suffice for proof-of-concept code but be inadequate for critical payment processing. Teams must interpret metrics within the context of risk, criticality, and development phase.

Metrics guide investigation, not judgment. High complexity or low coverage identifies code requiring attention but does not automatically indicate bad code. Complex domain logic sometimes requires complex implementations. Legacy code might lack tests while functioning reliably. Metrics highlight where to investigate deeper, not what to condemn immediately.

Aggregation obscures important details. System-wide average complexity of 8 might hide individual functions with complexity of 40. Overall test coverage of 85% might mask critical modules with 20% coverage. Metrics must be examined at appropriate granularity to reveal actionable insights. Averages provide overview; distributions reveal problems.

Measurement changes behavior. Once teams track metrics, developers optimize for those metrics. This can improve quality when metrics align with quality goals. However, poorly chosen metrics drive dysfunctional behavior—maximizing test coverage through trivial tests, reducing complexity through inappropriate decomposition, or gaming measurements without improving actual quality.

Ruby Implementation

Ruby's dynamic nature and metaprogramming capabilities enable sophisticated code analysis tools. Multiple gems provide metric calculation, analysis, and reporting. These tools parse Ruby source code, build abstract syntax trees, and compute various measurements.

SimpleCov measures test coverage by tracking which lines execute during test runs. It integrates with test frameworks through Ruby's Coverage module, producing detailed reports showing covered and uncovered code.

# spec/spec_helper.rb
require 'simplecov'

SimpleCov.start do
  add_filter '/spec/'
  add_filter '/vendor/'
  
  add_group 'Models', 'app/models'
  add_group 'Controllers', 'app/controllers'
  add_group 'Services', 'app/services'
  
  minimum_coverage 80
  refuse_coverage_drop
end

SimpleCov generates coverage data after test execution, calculating line coverage percentages for each file and the overall project. The minimum_coverage setting enforces quality gates, failing builds when coverage drops below thresholds.

# Custom coverage formatter
class MetricsFormatter
  def format(result)
    result.files.each do |file|
      coverage_percent = file.covered_percent.round(2)
      missed_lines = file.missed_lines.count
      
      puts "#{file.filename}: #{coverage_percent}% (#{missed_lines} missed lines)"
    end
  end
end

SimpleCov.formatter = MetricsFormatter

Flog calculates complexity scores based on the ABC metric (Assignments, Branches, Calls). It assigns points to different language constructs, producing an aggregate complexity score per method.

# Using Flog programmatically
require 'flog'

flogger = Flog.new
flogger.flog('app/models/order.rb')

flogger.calculate_total_scores

flogger.each_by_score do |class_method, score|
  puts "#{class_method}: #{score.round(2)}" if score > 10
end

Flog's scoring considers assignment operations, branches, method calls, and other complexity factors. Higher scores indicate more complex code requiring closer review or refactoring.

RuboCop enforces style guidelines and detects potential problems through static analysis. While primarily a linter, it calculates metrics like method length, class length, and cyclomatic complexity.

# .rubocop.yml
Metrics/MethodLength:
  Max: 15
  CountComments: false

Metrics/ClassLength:
  Max: 100
  
Metrics/CyclomaticComplexity:
  Max: 8

Metrics/PerceivedComplexity:
  Max: 10

RuboCop's metric cops enforce length and complexity limits, failing checks when code exceeds configured thresholds. This integration into continuous integration pipelines prevents problematic code from merging.

# Running RuboCop programmatically
require 'rubocop'

config = RuboCop::ConfigStore.new
team = RuboCop::Cop::Team.new(RuboCop::Cop::Registry.global, config)

file_paths = Dir.glob('app/**/*.rb')
results = team.inspect_files(file_paths)

results.files.each do |file|
  file.offenses.select { |o| o.cop_name.start_with?('Metrics/') }.each do |offense|
    puts "#{file.path}:#{offense.line} - #{offense.cop_name}: #{offense.message}"
  end
end

Reek detects code smells—patterns indicating potential design problems. It analyzes code structure for duplication, long parameter lists, feature envy, and other smell categories.

# Using Reek programmatically
require 'reek'

examiner = Reek::Examiner.new('app/services/payment_processor.rb')

examiner.smells.each do |smell|
  puts "#{smell.smell_type}: #{smell.message}"
  puts "  Lines: #{smell.lines.join(', ')}"
  puts "  Context: #{smell.context}"
end

Flay detects duplicate code by analyzing structural similarity. Unlike simple text comparison, Flay understands Ruby syntax and identifies functionally similar code even when variable names or formatting differs.

# Detecting duplication with Flay
require 'flay'

flay = Flay.new(fuzzy: false, liberal: false)
flay.process(*Dir.glob('app/**/*.rb'))

flay.analyze

flay.report.each do |structural_hash, nodes|
  similarity = flay.masses[structural_hash]
  next if similarity < 50  # Skip low-similarity duplications
  
  puts "Duplication score: #{similarity}"
  nodes.each do |node|
    puts "  #{node.file}:#{node.line}"
  end
end

MetricFu aggregates multiple metrics tools into unified reports. It runs Flog, Flay, Reek, RuboCop, and other analyzers, combining results into comprehensive dashboards.

# metric_fu configuration
MetricFu::Configuration.run do |config|
  config.configure_metrics do |metrics|
    metrics.enabled = [:flog, :flay, :reek, :roodi]
  end
  
  config.configure_metric(:flog) do |flog|
    flog.continue_on_failure = true
    flog.dirs_to_flog = ['app', 'lib']
  end
  
  config.configure_metric(:flay) do |flay|
    flay.minimum_score = 50
    flay.dirs_to_flay = ['app', 'lib']
  end
end

Tools & Ecosystem

The Ruby ecosystem provides extensive tooling for code metrics across different quality dimensions. Tools range from focused single-metric analyzers to comprehensive quality platforms.

Coverage Analysis Tools track test execution to measure how thoroughly tests exercise code. SimpleCov dominates Ruby coverage analysis with widespread adoption and framework integration. Deep-Cover provides more detailed coverage analysis including branch coverage and execution counts per line. These tools integrate into test suites transparently, requiring minimal configuration.

Complexity Analyzers calculate various complexity metrics. Flog computes ABC complexity scores. RuboCop's metric cops measure cyclomatic and perceived complexity. Saikuro generates complexity reports with HTML visualization. Each tool uses slightly different algorithms and thresholds, so teams often use multiple tools to cross-validate complexity assessments.

Code Smell Detectors identify design problems and anti-patterns. Reek remains the primary Ruby smell detector, analyzing code for feature envy, long parameter lists, duplicate code, and other smells. RuboCop's lint cops detect some smells alongside style violations. Flay specializes in duplication detection through structural analysis.

Static Analysis Platforms combine multiple metrics into unified reports. MetricFu aggregates coverage, complexity, duplication, and smell detection. Rubycritic combines Flog, Reek, and Churn to generate code quality grades with trend analysis. Code Climate provides commercial hosted analysis integrating numerous metrics with GitHub workflows.

Churn Analysis Tools measure code change frequency. Churn detects files frequently modified, indicating instability or hotspots. MetricFu includes churn analysis. Git-based scripts calculate churn from repository history. High churn combined with high complexity identifies the riskiest code requiring attention.

Maintainability Calculators generate composite scores representing overall code health. Rubycritic calculates letter grades (A-F) based on complexity, smells, and churn. These scores simplify communication with non-technical stakeholders but obscure underlying metric details.

Continuous Integration Integration embeds metrics into development workflows. RuboCop runs in pre-commit hooks and CI pipelines, blocking merges that violate standards. SimpleCov fails builds when coverage drops. Code Climate automatically comments on pull requests with metric changes. Pronto provides incremental analysis, reviewing only changed files.

IDE Integration surfaces metrics during development. RubyMine shows complexity indicators inline with code. VS Code extensions display coverage and smell information. This real-time feedback helps developers address issues before committing code.

Custom Metric Tools address domain-specific needs. Teams build custom analyzers using Ruby's parser library to measure application-specific metrics like security pattern compliance or architectural rule adherence.

# Custom metric collector
require 'parser/current'

class ApiEndpointMetrics
  def initialize(file_path)
    @file_path = file_path
    @endpoints = []
  end
  
  def analyze
    code = File.read(@file_path)
    ast = Parser::CurrentRuby.parse(code)
    
    process_node(ast)
    
    {
      total_endpoints: @endpoints.count,
      endpoints_with_auth: @endpoints.count { |e| e[:has_auth] },
      endpoints_with_validation: @endpoints.count { |e| e[:has_validation] },
      average_complexity: @endpoints.sum { |e| e[:complexity] } / @endpoints.count.to_f
    }
  end
  
  private
  
  def process_node(node)
    return unless node.is_a?(Parser::AST::Node)
    
    if endpoint_definition?(node)
      @endpoints << analyze_endpoint(node)
    end
    
    node.children.each { |child| process_node(child) }
  end
end

Practical Examples

Establishing Coverage Baselines requires measuring current coverage before enforcing thresholds. Teams inherit legacy codebases with unknown coverage and must establish realistic starting points.

# Coverage audit script
require 'simplecov'
require 'json'

SimpleCov.start do
  add_filter '/spec/'
  track_files 'app/**/*.rb'
end

# Run full test suite
require_relative '../spec/spec_helper'
RSpec.configure do |config|
  config.after(:suite) do
    result = SimpleCov.result
    
    coverage_by_directory = result.files.group_by { |f| f.filename.split('/')[1] }
    
    report = coverage_by_directory.transform_values do |files|
      {
        average_coverage: files.sum(&:covered_percent) / files.count,
        total_files: files.count,
        fully_covered: files.count { |f| f.covered_percent == 100 },
        poorly_covered: files.count { |f| f.covered_percent < 50 }
      }
    end
    
    File.write('coverage_baseline.json', JSON.pretty_generate(report))
  end
end

This audit generates baseline data showing which components have acceptable coverage and which require improvement. Teams use this data to set incremental improvement goals rather than arbitrary universal thresholds.

Tracking Complexity Growth identifies when refactoring becomes necessary. Regular complexity measurement reveals gradual degradation before it becomes crisis.

# Complexity trend tracker
require 'flog'
require 'yaml'

class ComplexityTracker
  THRESHOLD = 20
  
  def self.track(directory, output_file)
    flogger = Flog.new(continue: true)
    flogger.flog(*Dir.glob("#{directory}/**/*.rb"))
    
    results = {}
    flogger.totals.each do |class_method, score|
      file_path = flogger.method_locations[class_method]
      results[class_method] = {
        score: score.round(2),
        file: file_path,
        timestamp: Time.now.iso8601,
        exceeds_threshold: score > THRESHOLD
      }
    end
    
    # Append to historical log
    history = File.exist?(output_file) ? YAML.load_file(output_file) : []
    history << { date: Date.today.to_s, metrics: results }
    
    File.write(output_file, YAML.dump(history))
    
    # Report current violations
    violations = results.select { |_, data| data[:exceeds_threshold] }
    if violations.any?
      puts "#{violations.count} methods exceed complexity threshold:"
      violations.each do |method, data|
        puts "  #{method}: #{data[:score]} (#{data[:file]})"
      end
    end
  end
end

ComplexityTracker.track('app', 'complexity_history.yml')

Running this script regularly (daily in CI) builds historical complexity data. Teams spot upward trends before complexity becomes unmanageable. The script also enforces thresholds, failing builds when complexity exceeds limits.

Identifying High-Risk Code combines multiple metrics to find code requiring immediate attention. Code that is complex, frequently changed, and poorly tested represents maximum risk.

# Risk assessment combining metrics
require 'flog'
require 'git'
require 'simplecov'

class RiskAnalyzer
  def initialize(repo_path, coverage_data)
    @repo = Git.open(repo_path)
    @coverage = coverage_data
  end
  
  def analyze_file(file_path)
    # Calculate complexity
    flogger = Flog.new
    flogger.flog(file_path)
    complexity = flogger.total_score
    
    # Calculate churn (commits in last 90 days)
    commits = @repo.log(90).path(file_path).count
    
    # Get coverage
    coverage = @coverage[file_path] || 0
    
    # Risk score formula
    risk_score = (complexity * 0.3) + (commits * 2) + ((100 - coverage) * 0.5)
    
    {
      file: file_path,
      complexity: complexity.round(2),
      churn: commits,
      coverage: coverage.round(2),
      risk_score: risk_score.round(2)
    }
  end
  
  def highest_risk_files(count = 10)
    files = Dir.glob('app/**/*.rb')
    results = files.map { |f| analyze_file(f) }
    results.sort_by { |r| -r[:risk_score] }.take(count)
  end
end

This analysis identifies the most problematic code for prioritized refactoring. High-risk files receive additional review scrutiny and test coverage improvements.

Enforcing Metric Standards in Code Review automates quality gates during pull request workflows. Automated checks prevent problematic code from merging while providing feedback to authors.

# CI script for PR metric checks
require 'rubocop'
require 'simplecov'
require 'json'

class PRMetricChecker
  def initialize(changed_files)
    @changed_files = changed_files
    @violations = []
  end
  
  def check
    check_rubocop_metrics
    check_coverage_changes
    
    if @violations.any?
      puts "Metric violations detected:"
      @violations.each do |violation|
        puts "  [#{violation[:severity]}] #{violation[:message]}"
      end
      exit 1
    else
      puts "All metric checks passed"
    end
  end
  
  private
  
  def check_rubocop_metrics
    config = RuboCop::ConfigStore.new
    team = RuboCop::Cop::Team.new(RuboCop::Cop::Registry.global, config)
    
    results = team.inspect_files(@changed_files)
    
    results.files.each do |file|
      metric_offenses = file.offenses.select { |o| o.cop_name.start_with?('Metrics/') }
      metric_offenses.each do |offense|
        @violations << {
          severity: offense.severity.name,
          message: "#{file.path}:#{offense.line} - #{offense.message}"
        }
      end
    end
  end
  
  def check_coverage_changes
    current_coverage = SimpleCov.result.covered_percent
    baseline_coverage = JSON.parse(File.read('.coverage_baseline.json'))['total']
    
    if current_coverage < baseline_coverage - 1
      @violations << {
        severity: :error,
        message: "Coverage decreased from #{baseline_coverage}% to #{current_coverage}%"
      }
    end
  end
end

changed_files = `git diff --name-only origin/main`.split("\n").select { |f| f.end_with?('.rb') }
PRMetricChecker.new(changed_files).check

This script runs in CI, checking only modified files. It fails builds that introduce metric violations or decrease coverage, maintaining quality standards without blocking all development.

Common Pitfalls

Optimizing metrics instead of quality occurs when developers game measurements without improving actual code quality. Splitting complex methods into many small methods might reduce complexity metrics while making code harder to follow. Adding trivial tests increases coverage without improving test quality. Metrics measure proxies for quality, not quality itself—improving the proxy does not guarantee improving the underlying quality.

Applying universal thresholds without context treats all code identically regardless of risk or criticality. Payment processing code requires higher test coverage than logging utilities. Complex domain logic justifies higher complexity than simple CRUD operations. One-size-fits-all thresholds either block legitimate code or allow problematic code through.

Ignoring metric combinations examines metrics in isolation rather than holistically. A file with 95% test coverage might have poor tests that only execute code without asserting behavior. Low complexity might indicate over-decomposition or shallow functionality. High complexity combined with low test coverage and high churn indicates critical risk; any single metric alone provides incomplete information.

Measuring without acting collects metric data but never uses it for improvement. Teams generate reports, post dashboards, track trends, but never address identified problems. Metrics consume effort without providing value when teams lack processes to act on findings. Effective measurement includes defined responses to metric violations.

Chasing perfect scores pursues 100% coverage or zero complexity at the expense of pragmatism. Perfect coverage requires testing trivial code, vendored libraries, and generated files—effort better spent on meaningful tests. Eliminating all complexity might necessitate awkward abstractions that obscure logic. Diminishing returns set in well before perfection.

Misinterpreting statistical metrics treats aggregates as representative when distribution matters more. Average complexity of 8 seems acceptable but might hide methods with complexity of 40. Median, maximum, and percentile distributions reveal problems that averages obscure. Box plots and histograms provide better insight than single summary statistics.

Comparing incomparable metrics evaluates projects or teams using raw metric values without accounting for context. A legacy Rails application reasonably has different metrics than a greenfield microservice. Comparing test coverage percentages between a stateless API and a stateful workflow engine ignores fundamental differences in testability. Metrics enable comparison only within similar contexts.

Trusting metric accuracy blindly assumes tools measure correctly without verification. Coverage tools miss branches in complex conditionals. Complexity calculators use different algorithms producing different scores. Duplication detectors generate false positives from boilerplate code. Manual inspection of flagged code separates legitimate concerns from tool limitations.

# Example of metric gaming through method extraction
# Original: High complexity but readable
def process_order(order)
  if order.amount > 1000 && order.customer.premium? && order.items.all?(&:in_stock?)
    apply_premium_discount(order)
    priority_queue(order)
  elsif order.amount > 1000 && order.customer.premium?
    apply_premium_discount(order)
  elsif order.items.all?(&:in_stock?)
    standard_queue(order)
  end
end

# Refactored: Lower complexity but less readable
def process_order(order)
  handle_premium_in_stock(order) if premium_in_stock?(order)
  handle_premium_no_stock(order) if premium_no_stock?(order)
  handle_standard_in_stock(order) if standard_in_stock?(order)
end

def premium_in_stock?(order)
  order.amount > 1000 && order.customer.premium? && order.items.all?(&:in_stock?)
end

def premium_no_stock?(order)
  order.amount > 1000 && order.customer.premium? && !order.items.all?(&:in_stock?)
end

def standard_in_stock?(order)
  order.amount <= 1000 && order.items.all?(&:in_stock?)
end

This refactoring reduces cyclomatic complexity through method extraction but fragments logic across many methods. The complexity still exists but is now distributed and harder to understand. Metrics improved while maintainability decreased.

Ignoring temporal aspects measures code at single points without tracking changes over time. A high-complexity legacy module might be stable and well-understood, requiring no attention. A recently added module with moderate complexity but rapid growth deserves scrutiny. Change velocity and trend direction matter as much as current values.

Focusing on code metrics exclusively neglects broader quality indicators. Code metrics measure implementation quality but ignore design quality, documentation completeness, operational stability, or user satisfaction. Teams need balanced quality frameworks covering code structure, system architecture, operational metrics, and business outcomes.

Implementation Approaches

Incremental Adoption introduces metrics gradually rather than enforcing comprehensive measurement immediately. Teams start with one or two high-value metrics, establish processes around them, then expand. Initial focus on test coverage provides immediate feedback and visible improvement. Once coverage tracking becomes routine, adding complexity or duplication measurement builds on existing practices.

Incremental adoption allows teams to learn metric interpretation before committing to enforcement. Early metrics inform threshold selection for later metrics. Teams discover which metrics provide value and which create noise in their specific context. Gradual rollout prevents metric fatigue from overwhelming simultaneous changes.

Threshold Ratcheting progressively tightens quality standards as code improves. Rather than enforcing ideal thresholds immediately, teams set achievable initial thresholds based on current state, then gradually increase requirements. A codebase with 40% test coverage starts with a 45% minimum threshold. After reaching 45%, the threshold increases to 50%. This approach provides steady improvement without blocking development.

Ratcheting works in CI pipelines through configuration updates after achieving current thresholds. Teams track threshold progression over time, celebrating incremental improvements. The approach accommodates legacy code—old code remains unchanged while new code meets higher standards. Eventually, refactoring brings legacy code up to current standards.

Metric-Driven Code Review incorporates metrics into pull request evaluation. Automated tools comment on pull requests with metric data for changed files. Reviewers use this data to guide review focus—high-complexity changes receive detailed scrutiny. Metric changes appear alongside code changes, making quality impact visible during review.

This approach surfaces metrics when developers can most easily address issues. Catching complexity increases during review costs less than fixing them later. Reviewers might request refactoring or additional tests based on metric data. The process becomes educational, helping developers internalize quality standards.

Continuous Monitoring tracks metrics in production systems, not just during development. Monitoring dashboards display current metrics and historical trends. Teams review metrics during planning to allocate refactoring time. Sudden metric changes trigger investigation—spike in complexity might indicate rushed feature implementation requiring cleanup.

Monitoring systems alert on threshold violations or significant changes. Teams establish ownership of metric improvement—specific developers or teams own reducing duplication in particular modules. Regular metric reviews become part of retrospectives, ensuring quality remains a continuous focus.

Pre-commit Quality Gates enforce metrics before code enters version control. Git hooks run metric checks locally, rejecting commits that violate standards. This provides immediate feedback with minimal disruption. Developers fix issues in their working environment rather than after committing and pushing.

Local enforcement requires careful threshold selection—overly strict checks frustrate developers and encourage bypassing hooks. Focusing on egregious violations (complexity over 25, new files under 50% coverage) balances quality improvement with developer experience. Teams often make pre-commit checks advisory (warning) while CI checks remain mandatory (blocking).

Differential Measurement analyzes only changed code rather than entire codebases. This approach works well for legacy systems where fixing all existing problems proves impractical. New and modified code must meet current standards while legacy code remains unchanged. Over time, as changes touch legacy code, quality gradually improves.

Differential tools compare metrics before and after changes, reporting only impacts of current work. A pull request increasing overall complexity by 50 points gets flagged even if individual methods stay under thresholds. This prevents gradual quality erosion from many small degradations.

Metric Budgets allocate acceptable complexity or technical debt across a project. Each module has a complexity budget based on its purpose and history. Simple utility modules get low budgets. Complex domain logic receives higher budgets. Teams track budget consumption and refactor before exceeding limits.

Budgets enable tradeoffs—accepting higher complexity in one area while maintaining lower complexity elsewhere. They acknowledge that perfect uniformity is unrealistic. Teams periodically review and adjust budgets based on changing requirements and lessons learned.

Reference

Common Code Metrics

Metric	Measures	Typical Threshold	Tools
Lines of Code (LOC)	Code size	Class: 100 lines, Method: 15 lines	RuboCop, SonarQube
Cyclomatic Complexity	Number of independent paths	10 per method	RuboCop, Flog
ABC Complexity	Assignments + Branches + Calls	20 per method	Flog
Test Coverage	Percentage of code executed by tests	80% minimum	SimpleCov, Deep-Cover
Code Duplication	Repeated code structures	Under 5%	Flay, RuboCop
Method Length	Number of lines in methods	10-15 lines	RuboCop
Class Length	Number of lines in classes	100-200 lines	RuboCop
Churn	Commit frequency per file	Context-dependent	MetricFu, Git
Coupling	Number of dependencies between classes	Low coupling preferred	Reek
Cohesion	Relatedness of class responsibilities	High cohesion preferred	Reek

RuboCop Metric Cops

Cop	Description	Default Threshold
Metrics/AbcSize	ABC complexity metric	17
Metrics/BlockLength	Length of blocks	25 lines
Metrics/ClassLength	Length of classes	100 lines
Metrics/CyclomaticComplexity	Cyclomatic complexity	7
Metrics/MethodLength	Length of methods	10 lines
Metrics/ModuleLength	Length of modules	100 lines
Metrics/ParameterLists	Number of parameters	5 parameters
Metrics/PerceivedComplexity	Perceived complexity	8

Flog Complexity Scoring

Construct	Points	Example
Assignment	1.0	x = value
Branch	1.0	if, unless, while
Call	1.0	Method invocation
Condition	1.0	Conditional expression
Yield	1.0	Block yield
Assignment Branch	2.0	x = if condition

SimpleCov Configuration Options

Option	Purpose	Example
minimum_coverage	Fail build below threshold	80
refuse_coverage_drop	Fail if coverage decreases	true/false
add_filter	Exclude paths from coverage	/spec/, /vendor/
add_group	Group files in reports	Models, Controllers
merge_timeout	Merge coverage across runs	3600 seconds
coverage_criterion	Coverage type to measure	line, branch

Reek Code Smell Categories

Category	Description	Examples
Control Couple	Methods that rely on control flag parameters	Boolean parameters
Data Clump	Frequently co-occurring parameters	Same 3+ params in multiple methods
Feature Envy	Method uses more features of another class	Excessive delegation
Long Parameter List	Methods with many parameters	More than 4 parameters
Repeated Conditional	Same conditional in multiple places	if user.admin? appears 5+ times
Too Many Statements	Method contains too many statements	10+ statements
Utility Function	Method that does not use instance variables	Could be class method

Metric Threshold Recommendations

Code Type	Coverage	Complexity	Duplication	Method Lines
Critical Business Logic	95%+	Under 8	0%	Under 10
Standard Application Code	80%+	Under 12	Under 3%	Under 15
Infrastructure/Utilities	70%+	Under 15	Under 5%	Under 20
Legacy/Stable Code	60%+	Context-dependent	Under 10%	Context-dependent

Tool Selection Matrix

Need	Recommended Tool	Alternative	Integration
Test Coverage	SimpleCov	Deep-Cover	CI, Git Hooks
Complexity Analysis	Flog, RuboCop	Saikuro	CI, IDE
Duplication Detection	Flay	RuboCop	CI
Code Smell Detection	Reek	RuboCop Lint	CI, Code Review
Comprehensive Analysis	MetricFu, Rubycritic	Code Climate	CI, Dashboard
Style Enforcement	RuboCop	StandardRB	Git Hooks, CI, IDE

Continuous Integration Metric Workflow

Stage	Action	Tools	Outcome
Pre-commit	Run fast local checks	RuboCop (selective cops)	Block obvious issues
Commit	Track file changes	Git	Identify modified files
PR Creation	Analyze changed files only	All metric tools	Comment on PR
CI Build	Run full metric suite	All tools	Fail if thresholds violated
Merge	Update baselines	Coverage, Complexity	Track trends
Post-merge	Generate reports	MetricFu, Rubycritic	Dashboard updates

Metric Interpretation Guidelines

Metric Range	Interpretation	Action
Coverage 90-100%	Excellent test coverage	Maintain current practices
Coverage 80-90%	Good coverage with some gaps	Identify untested critical paths
Coverage 60-80%	Moderate coverage	Prioritize testing critical code
Coverage 0-60%	Insufficient coverage	Urgent testing needed
Complexity 0-10	Simple, maintainable code	No action needed
Complexity 11-20	Moderately complex	Consider refactoring opportunities
Complexity 21-40	High complexity	Refactor recommended
Complexity 40+	Very high complexity	Urgent refactoring required

Code Metrics