CrackedRuby logo

CrackedRuby

CGI Module

Overview

Ruby's CGI module provides classes and methods for creating Common Gateway Interface scripts that handle HTTP requests and responses. The module includes the core CGI class for parsing form data and managing HTTP interactions, along with specialized classes for HTML generation, cookie handling, and session management.

The CGI class serves as the primary interface, automatically parsing query strings, form data, and multipart uploads from HTTP requests. It provides methods for accessing form parameters, managing HTTP headers, and generating proper responses. Ruby's implementation handles character encoding, URL decoding, and multipart parsing transparently.

require 'cgi'

# Basic CGI initialization
cgi = CGI.new
name = cgi['username']  # Access form parameter
puts "Content-Type: text/html\n\n"
puts "<h1>Hello #{CGI.escapeHTML(name)}</h1>"

The module includes several utility methods for web development tasks. CGI.escape and CGI.unescape handle URL encoding and decoding, while CGI.escapeHTML and CGI.unescapeHTML manage HTML entity encoding for security and display purposes.

# URL and HTML escaping
url_safe = CGI.escape("hello world & more")  
# => "hello+world+%26+more"

html_safe = CGI.escapeHTML("<script>alert('xss')</script>")
# => "&lt;script&gt;alert(&#39;xss&#39;)&lt;/script&gt;"

Ruby's CGI implementation supports cookies through the CGI::Cookie class, which handles cookie creation, parsing, and attribute management. The module also provides session management capabilities and HTML generation utilities for dynamic content creation.

Basic Usage

Creating a CGI script begins with requiring the module and instantiating a CGI object. The constructor automatically parses incoming request data, making form parameters accessible through hash-like syntax or method calls.

#!/usr/bin/env ruby
require 'cgi'

cgi = CGI.new
user_input = cgi['message']
email = cgi.params['email'].first  # params returns arrays

# Generate response headers
puts cgi.header('type' => 'text/html', 'charset' => 'utf-8')
puts "<html><body><h1>Message: #{CGI.escapeHTML(user_input)}</h1></body></html>"

Form parameter access supports both single values and arrays for fields that may contain multiple values. The params method returns a hash where values are always arrays, while bracket notation returns the first value or nil.

cgi = CGI.new

# Single value access
username = cgi['username']           # String or nil
categories = cgi['category']         # First value only

# Array access for multiple values  
all_categories = cgi.params['category']  # Array of all values
selected_options = cgi.params['options'] || []

File uploads require multipart form encoding and are handled through CGI::TempFile objects. These provide access to the uploaded file content, original filename, and content type information.

cgi = CGI.new

uploaded_file = cgi['attachment']
if uploaded_file.respond_to?(:read)
  filename = uploaded_file.original_filename
  content_type = uploaded_file.content_type
  file_size = uploaded_file.size
  
  # Process file content
  File.open("/uploads/#{filename}", 'wb') do |f|
    f.write(uploaded_file.read)
  end
end

Cookie management involves creating CGI::Cookie objects with specified names, values, and attributes. Cookies can be set in response headers and accessed from incoming requests through the cookies hash.

# Creating and setting cookies
session_cookie = CGI::Cookie.new(
  'name' => 'session_id',
  'value' => 'abc123',
  'expires' => Time.now + 86400,  # 24 hours
  'path' => '/',
  'secure' => true
)

puts cgi.header('cookie' => session_cookie)

# Reading incoming cookies
existing_session = cgi.cookies['session_id'].first if cgi.cookies['session_id']

Error Handling & Debugging

CGI applications must handle various input validation scenarios and encoding issues. Malformed requests, missing parameters, and invalid file uploads can cause runtime errors that require graceful handling.

require 'cgi'

begin
  cgi = CGI.new
  
  # Validate required parameters
  username = cgi['username']
  raise ArgumentError, "Username required" if username.nil? || username.empty?
  
  # Validate parameter format
  email = cgi['email']
  unless email =~ /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i
    raise ArgumentError, "Invalid email format"
  end
  
rescue ArgumentError => e
  puts cgi.header('status' => '400 Bad Request')
  puts "<h1>Error: #{CGI.escapeHTML(e.message)}</h1>"
rescue StandardError => e
  puts cgi.header('status' => '500 Internal Server Error')
  puts "<h1>Server Error</h1>"
  # Log actual error for debugging
  File.open('/var/log/cgi_errors.log', 'a') do |log|
    log.puts "#{Time.now}: #{e.class} - #{e.message}"
    log.puts e.backtrace.join("\n")
  end
end

File upload validation requires checking multiple attributes to ensure security and prevent resource exhaustion. Size limits, content type validation, and filename sanitization prevent common attack vectors.

def validate_upload(uploaded_file)
  return nil unless uploaded_file.respond_to?(:read)
  
  # Check file size
  if uploaded_file.size > 5_000_000  # 5MB limit
    raise ArgumentError, "File too large"
  end
  
  # Validate content type
  allowed_types = ['image/jpeg', 'image/png', 'image/gif']
  unless allowed_types.include?(uploaded_file.content_type)
    raise ArgumentError, "Invalid file type"
  end
  
  # Sanitize filename
  original_name = uploaded_file.original_filename
  sanitized_name = File.basename(original_name).gsub(/[^\w.-]/, '_')
  
  if sanitized_name.empty?
    raise ArgumentError, "Invalid filename"
  end
  
  { content: uploaded_file.read, filename: sanitized_name }
end

# Usage with error handling
begin
  file_data = validate_upload(cgi['upload'])
  # Process validated file
rescue ArgumentError => e
  puts cgi.header('status' => '422 Unprocessable Entity')
  puts "Upload error: #{CGI.escapeHTML(e.message)}"
end

Debugging CGI scripts requires logging capabilities since standard output is reserved for HTTP responses. Environment variable inspection and request debugging help identify issues in production environments.

def debug_request(cgi)
  debug_info = {
    'REQUEST_METHOD' => ENV['REQUEST_METHOD'],
    'CONTENT_TYPE' => ENV['CONTENT_TYPE'],
    'CONTENT_LENGTH' => ENV['CONTENT_LENGTH'],
    'QUERY_STRING' => ENV['QUERY_STRING'],
    'HTTP_USER_AGENT' => ENV['HTTP_USER_AGENT'],
    'REMOTE_ADDR' => ENV['REMOTE_ADDR']
  }
  
  File.open('/var/log/cgi_debug.log', 'a') do |log|
    log.puts "=== Request Debug #{Time.now} ==="
    debug_info.each { |k, v| log.puts "#{k}: #{v}" }
    log.puts "Parameters: #{cgi.params.inspect}"
    log.puts "Cookies: #{cgi.cookies.keys}"
  end
end

Production Patterns

Production CGI deployments require proper security headers, session management, and integration with web server configurations. Apache and Nginx configurations must properly execute CGI scripts and handle file permissions.

require 'cgi'
require 'cgi/session'
require 'digest/sha2'

class SecureCGIApplication
  def initialize
    @cgi = CGI.new
    @session = CGI::Session.new(@cgi,
      'database_manager' => CGI::Session::FileStore,
      'session_path' => '/tmp/sessions',
      'session_expires' => Time.now + 1800,  # 30 minutes
      'prefix' => 'webapp_'
    )
  end

  def authenticate_user
    username = @cgi['username']
    password = @cgi['password']
    
    return false unless username && password
    
    # Hash password for comparison (use proper password hashing in production)
    password_hash = Digest::SHA256.hexdigest(password + 'salt_value')
    
    # Verify against database/file
    authenticated = verify_credentials(username, password_hash)
    
    if authenticated
      @session['user_id'] = username
      @session['last_activity'] = Time.now.to_i
      true
    else
      false
    end
  end

  def require_authentication
    user_id = @session['user_id']
    last_activity = @session['last_activity']
    
    unless user_id && last_activity && 
           (Time.now.to_i - last_activity.to_i) < 1800
      redirect_to_login
      return false
    end
    
    @session['last_activity'] = Time.now.to_i
    true
  end

  def generate_response
    headers = {
      'type' => 'text/html',
      'charset' => 'utf-8',
      'X-Frame-Options' => 'DENY',
      'X-Content-Type-Options' => 'nosniff',
      'X-XSS-Protection' => '1; mode=block',
      'Strict-Transport-Security' => 'max-age=31536000; includeSubDomains'
    }
    
    puts @cgi.header(headers)
  end
  
  private
  
  def verify_credentials(username, password_hash)
    # Implementation depends on storage mechanism
    # This is a simplified example
    users = load_user_database
    users[username] == password_hash
  end
  
  def redirect_to_login
    puts @cgi.header('status' => '302 Found', 'location' => '/login.html')
    exit
  end
end

Load balancing and scaling considerations affect session storage and state management. File-based sessions work for single-server deployments, but distributed applications require shared session storage.

# Database-backed session management for scalability
require 'pg'  # or preferred database adapter

class DatabaseSessionStore < CGI::Session::NullStore
  def initialize(session, options = {})
    @db_config = options['database_config']
    @table_name = options['table_name'] || 'cgi_sessions'
    super
  end

  def restore
    conn = PG.connect(@db_config)
    result = conn.exec_params(
      "SELECT data FROM #{@table_name} WHERE session_id = $1 AND expires_at > NOW()",
      [@session_id]
    )
    
    if result.ntuples > 0
      Marshal.load(result[0]['data'])
    else
      {}
    end
  ensure
    conn&.close
  end

  def update
    data = Marshal.dump(@h)
    expires_at = Time.now + 1800  # 30 minutes
    
    conn = PG.connect(@db_config)
    conn.exec_params(
      "INSERT INTO #{@table_name} (session_id, data, expires_at) VALUES ($1, $2, $3)
       ON CONFLICT (session_id) DO UPDATE SET data = $2, expires_at = $3",
      [@session_id, data, expires_at]
    )
  ensure
    conn&.close
  end

  def delete
    conn = PG.connect(@db_config)
    conn.exec_params("DELETE FROM #{@table_name} WHERE session_id = $1", [@session_id])
  ensure
    conn&.close
  end
end

Monitoring production CGI applications involves logging performance metrics, error rates, and security events. Integration with system monitoring tools provides visibility into application health and performance characteristics.

class CGIMonitoring
  def self.log_request(cgi, start_time, status = '200')
    duration = Time.now - start_time
    log_entry = {
      timestamp: Time.now.iso8601,
      remote_addr: ENV['REMOTE_ADDR'],
      method: ENV['REQUEST_METHOD'],
      path: ENV['SCRIPT_NAME'],
      query: ENV['QUERY_STRING'],
      status: status,
      duration_ms: (duration * 1000).round(2),
      content_length: ENV['CONTENT_LENGTH']&.to_i || 0,
      user_agent: ENV['HTTP_USER_AGENT']
    }
    
    File.open('/var/log/cgi_access.json', 'a') do |f|
      f.puts log_entry.to_json
    end
  end
  
  def self.log_security_event(event_type, details)
    event = {
      timestamp: Time.now.iso8601,
      type: event_type,
      remote_addr: ENV['REMOTE_ADDR'],
      details: details,
      user_agent: ENV['HTTP_USER_AGENT']
    }
    
    File.open('/var/log/cgi_security.json', 'a') do |f|
      f.puts event.to_json
    end
  end
end

Common Pitfalls

Cross-site scripting vulnerabilities occur when user input is displayed without proper escaping. The CGI.escapeHTML method must be used consistently for all user-generated content displayed in HTML responses.

# VULNERABLE - Direct output of user input
username = cgi['username']
puts "<h1>Welcome #{username}!</h1>"  # XSS vulnerability

# SECURE - Proper HTML escaping
username = cgi['username']
puts "<h1>Welcome #{CGI.escapeHTML(username)}!</h1>"

# VULNERABLE - Building HTML with concatenation
message = cgi['message']
html = "<div class='message'>" + message + "</div>"

# SECURE - Escape all dynamic content
message = cgi['message']
html = "<div class='message'>#{CGI.escapeHTML(message)}</div>"

Parameter pollution attacks exploit how CGI handles multiple parameters with the same name. Applications expecting single values may receive arrays, leading to unexpected behavior or security issues.

# Vulnerable to parameter pollution
user_id = cgi['user_id']
# If URL contains ?user_id=123&user_id=456, user_id gets the first value (123)
# But cgi.params['user_id'] contains ['123', '456']

# Secure parameter handling
def get_single_param(cgi, name, pattern = nil)
  values = cgi.params[name]
  return nil if values.nil? || values.empty?
  
  if values.size > 1
    raise ArgumentError, "Multiple values for parameter #{name}"
  end
  
  value = values.first
  if pattern && !(value =~ pattern)
    raise ArgumentError, "Invalid format for parameter #{name}"
  end
  
  value
end

# Usage with validation
user_id = get_single_param(cgi, 'user_id', /\A\d+\z/)

File upload security requires careful handling of filenames, content types, and storage locations. Attackers can manipulate these attributes to overwrite system files or execute malicious code.

# VULNERABLE - Using original filename directly
uploaded_file = cgi['upload']
File.open("/uploads/#{uploaded_file.original_filename}", 'wb') do |f|
  f.write(uploaded_file.read)
end

# SECURE - Filename sanitization and validation
def secure_upload_handling(uploaded_file, allowed_extensions = ['.jpg', '.png', '.pdf'])
  return nil unless uploaded_file.respond_to?(:read)
  
  # Generate safe filename
  original_name = uploaded_file.original_filename
  extension = File.extname(original_name).downcase
  
  unless allowed_extensions.include?(extension)
    raise ArgumentError, "File type not allowed"
  end
  
  # Create unique, safe filename
  timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
  random_id = rand(10000).to_s.rjust(4, '0')
  safe_filename = "upload_#{timestamp}_#{random_id}#{extension}"
  
  # Store in secure directory with proper permissions
  upload_dir = '/var/uploads'
  file_path = File.join(upload_dir, safe_filename)
  
  File.open(file_path, 'wb', 0644) do |f|
    f.write(uploaded_file.read)
  end
  
  safe_filename
end

Session fixation vulnerabilities occur when session IDs are not regenerated after authentication. CGI::Session requires explicit session ID regeneration to prevent session hijacking attacks.

# VULNERABLE - Keeping same session ID after login
def login_user(cgi, username, password)
  if authenticate(username, password)
    session = CGI::Session.new(cgi)
    session['user_id'] = username  # Session ID remains same
    return true
  end
  false
end

# SECURE - Regenerate session after authentication
def secure_login(cgi, username, password)
  if authenticate(username, password)
    # Destroy existing session
    old_session = CGI::Session.new(cgi)
    old_session.delete
    
    # Create new session with different ID
    new_session = CGI::Session.new(cgi, 'new_session' => true)
    new_session['user_id'] = username
    new_session['authenticated_at'] = Time.now.to_i
    
    return new_session
  end
  nil
end

Character encoding issues arise when processing form data with different encodings. Ruby's CGI module requires explicit encoding handling to prevent data corruption and security vulnerabilities.

# Handle encoding issues properly
def process_multilingual_input(cgi)
  input_text = cgi['message']
  return '' if input_text.nil?
  
  # Force UTF-8 encoding and handle invalid bytes
  if input_text.encoding != Encoding::UTF_8
    begin
      input_text = input_text.encode('UTF-8', invalid: :replace, undef: :replace)
    rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError
      input_text = input_text.force_encoding('UTF-8')
      input_text = input_text.scrub('?')  # Replace invalid bytes
    end
  end
  
  # Validate that string is valid UTF-8
  unless input_text.valid_encoding?
    input_text = input_text.scrub('?')
  end
  
  input_text
end

Performance & Memory

Large form data and file uploads can consume significant memory and processing time. Ruby's CGI implementation loads all form data into memory, requiring careful resource management for applications handling substantial uploads.

# Memory-efficient file upload handling
def stream_large_upload(cgi, max_size = 100_000_000)  # 100MB limit
  uploaded_file = cgi['large_file']
  return nil unless uploaded_file.respond_to?(:read)
  
  if uploaded_file.size > max_size
    raise ArgumentError, "File exceeds maximum size"
  end
  
  # Stream file in chunks to reduce memory usage
  output_path = "/tmp/upload_#{Time.now.to_i}_#{rand(1000)}"
  total_written = 0
  
  File.open(output_path, 'wb') do |output|
    while chunk = uploaded_file.read(8192)  # 8KB chunks
      output.write(chunk)
      total_written += chunk.bytesize
      
      # Safety check during streaming
      if total_written > max_size
        File.unlink(output_path)
        raise ArgumentError, "File size exceeded during upload"
      end
    end
  end
  
  { path: output_path, size: total_written }
ensure
  uploaded_file.close if uploaded_file.respond_to?(:close)
end

Parameter parsing performance degrades with large numbers of form fields or deeply nested parameter structures. Optimization strategies include parameter count limits and selective parsing.

# Performance monitoring for parameter parsing
class OptimizedCGI
  def initialize(max_params = 1000)
    @max_params = max_params
    @start_time = Time.now
    @cgi = parse_with_limits
  end
  
  def [](key)
    @cgi[key]
  end
  
  def params
    @cgi.params
  end
  
  private
  
  def parse_with_limits
    # Check content length before parsing
    content_length = ENV['CONTENT_LENGTH'].to_i
    if content_length > 50_000_000  # 50MB limit
      raise ArgumentError, "Request too large"
    end
    
    cgi = CGI.new
    
    # Count total parameters
    param_count = cgi.params.values.sum(&:size)
    if param_count > @max_params
      raise ArgumentError, "Too many parameters (#{param_count} > #{@max_params})"
    end
    
    parse_time = Time.now - @start_time
    if parse_time > 5  # 5 second parsing limit
      File.open('/var/log/slow_parsing.log', 'a') do |f|
        f.puts "Slow parsing: #{parse_time}s, params: #{param_count}, size: #{content_length}"
      end
    end
    
    cgi
  end
end

Memory usage optimization requires careful management of large strings and temporary files. Ruby's garbage collection can be tuned for CGI applications with specific memory patterns.

# Memory-conscious CGI response generation
class MemoryOptimizedResponse
  def initialize(cgi)
    @cgi = cgi
    @output_buffer = StringIO.new
  end
  
  def generate_large_response(data_array)
    # Stream response instead of building large strings
    puts @cgi.header('type' => 'text/csv', 'disposition' => 'attachment')
    
    # Process data in batches to control memory usage
    data_array.each_slice(1000) do |batch|
      csv_chunk = batch.map { |row| row.join(',') }.join("\n")
      print csv_chunk
      print "\n" unless batch == data_array.last(1000)
      
      # Force garbage collection periodically
      GC.start if batch.size == 1000 && rand < 0.1
    end
  end
  
  def memory_stats
    {
      process_memory: `ps -o rss= -p #{Process.pid}`.strip.to_i * 1024,
      ruby_memory: GC.stat[:heap_live_slots] * 40,  # Approximate bytes per slot
      gc_count: GC.count
    }
  end
end

# Usage with memory monitoring
response_handler = MemoryOptimizedResponse.new(cgi)
start_memory = response_handler.memory_stats[:process_memory]

response_handler.generate_large_response(large_dataset)

end_memory = response_handler.memory_stats[:process_memory]
memory_used = end_memory - start_memory

File.open('/var/log/memory_usage.log', 'a') do |f|
  f.puts "Memory used: #{memory_used / 1024}KB"
end

Database connection pooling and resource management become critical for high-traffic CGI applications. Connection reuse and proper cleanup prevent resource exhaustion.

# Resource-efficient database integration
class PooledDatabaseCGI
  @connection_pool = []
  @pool_mutex = Mutex.new
  
  def self.get_connection(config)
    @pool_mutex.synchronize do
      connection = @connection_pool.pop
      return connection if connection && connection.status == PG::CONNECTION_OK
      
      PG.connect(config)
    end
  end
  
  def self.return_connection(connection)
    @pool_mutex.synchronize do
      @connection_pool.push(connection) if connection && connection.status == PG::CONNECTION_OK
      # Limit pool size
      @connection_pool.pop.close if @connection_pool.size > 10
    end
  end
  
  def process_request_with_db(cgi)
    conn = self.class.get_connection(database_config)
    
    # Process request with database
    user_id = cgi['user_id']
    result = conn.exec_params('SELECT name FROM users WHERE id = $1', [user_id])
    
    puts cgi.header
    puts "<h1>Hello #{CGI.escapeHTML(result[0]['name'])}</h1>"
    
  ensure
    self.class.return_connection(conn) if conn
  end
  
  private
  
  def database_config
    {
      host: ENV['DB_HOST'] || 'localhost',
      port: ENV['DB_PORT'] || 5432,
      dbname: ENV['DB_NAME'],
      user: ENV['DB_USER'],
      password: ENV['DB_PASSWORD'],
      connect_timeout: 5,
      statement_timeout: 30000
    }
  end
end

Reference

CGI Class Methods

Method Parameters Returns Description
CGI.new(type = "query") type (String) CGI Creates new CGI instance with specified input type
CGI.escape(string) string (String) String URL-encodes string for use in URLs
CGI.unescape(string) string (String) String URL-decodes previously encoded string
CGI.escapeHTML(string) string (String) String HTML-encodes string to prevent XSS attacks
CGI.unescapeHTML(string) string (String) String HTML-decodes previously encoded string
CGI.escape_element(string, *elements) string (String), elements (Array) String Escapes specified HTML elements only
CGI.unescape_element(string, *elements) string (String), elements (Array) String Unescapes specified HTML elements only
CGI.parse(query) query (String) Hash Parses query string into parameter hash

CGI Instance Methods

Method Parameters Returns Description
#[](key) key (String) String or nil Returns first value for parameter key
#params None Hash Returns hash of all parameters as arrays
#keys None Array Returns array of all parameter keys
#has_key?(key) key (String) Boolean Checks if parameter exists
#header(options = {}) options (Hash) String Generates HTTP response headers
#cookies None Hash Returns hash of cookies from request
#accept_charset None String Returns accepted character sets
#auth_type None String Returns authentication type
#content_length None Integer Returns content length of request
#content_type None String Returns content type of request
#gateway_interface None String Returns CGI version information
#path_info None String Returns additional path information
#path_translated None String Returns translated path information
#query_string None String Returns query string from URL
#remote_addr None String Returns client IP address
#remote_host None String Returns client hostname
#remote_ident None String Returns client identity information
#remote_user None String Returns authenticated username
#request_method None String Returns HTTP request method
#script_name None String Returns script path
#server_name None String Returns server hostname
#server_port None Integer Returns server port number
#server_protocol None String Returns HTTP protocol version
#server_software None String Returns web server software
#user_agent None String Returns client user agent string

CGI::Cookie Class Methods

Method Parameters Returns Description
CGI::Cookie.new(options) options (Hash) CGI::Cookie Creates new cookie with specified attributes
CGI::Cookie.parse(raw_cookie) raw_cookie (String) Hash Parses cookie string into cookie objects

CGI::Cookie Instance Methods

Method Parameters Returns Description
#name None String Returns cookie name
#value None Array Returns cookie values as array
#domain None String Returns cookie domain
#path None String Returns cookie path
#expires None Time Returns cookie expiration time
#secure None Boolean Returns secure flag status
#httponly None Boolean Returns HTTP-only flag status
#to_s None String Returns formatted cookie string

Header Options

Option Type Description
type String Content-Type header value (default: "text/html")
charset String Character encoding (default: system encoding)
status String HTTP status code and message
server String Server header value
connection String Connection header value
length Integer Content-Length header value
language String Content-Language header value
expires Time Expires header value
cookie CGI::Cookie or Array Cookie objects to set
location String Location header for redirects
cache-control String Cache-Control header value
pragma String Pragma header value

Common Environment Variables

Variable Description
REQUEST_METHOD HTTP method (GET, POST, PUT, DELETE)
CONTENT_TYPE MIME type of request body
CONTENT_LENGTH Size of request body in bytes
QUERY_STRING URL parameters after ? character
HTTP_USER_AGENT Client browser identification
HTTP_ACCEPT MIME types accepted by client
HTTP_COOKIE Cookies sent by client
REMOTE_ADDR Client IP address
REMOTE_HOST Client hostname
SERVER_NAME Server hostname
SERVER_PORT Server port number
SCRIPT_NAME CGI script path
PATH_INFO Additional path information

CGI Input Types

Type Description Usage
"query" Parse query string and form data Default for most requests
"html" Enable HTML generation methods For scripts generating HTML
"xml" Enable XML generation methods For scripts generating XML
"multipart" Handle multipart form data For file upload forms