CrackedRuby - Output Encoding

Overview

Output encoding converts special characters in data into their safe encoded equivalents before rendering content to users. This transformation prevents browsers, databases, or other systems from interpreting user-supplied data as executable code or commands. The encoding process maintains the display value while neutralizing potentially malicious content.

When an application displays user input without encoding, attackers can inject malicious scripts that execute in other users' browsers, steal credentials, manipulate DOM elements, or perform unauthorized actions. Output encoding breaks this attack vector by ensuring user data remains data rather than becoming executable code.

The encoding mechanism varies by context. HTML contexts require different encoding than JavaScript contexts, which differ from URL contexts. Each context has specific characters that carry special meaning and require transformation. The character < becomes < in HTML but \u003c in JavaScript strings, and %3C in URLs.

# Without encoding - vulnerable to XSS
user_input = "<script>alert('XSS')</script>"
html = "<div>#{user_input}</div>"
# => "<div><script>alert('XSS')</script></div>"

# With HTML encoding - safe
require 'cgi'
encoded = CGI.escapeHTML(user_input)
html = "<div>#{encoded}</div>"
# => "<div>&lt;script&gt;alert(&#39;XSS&#39;)&lt;/script&gt;</div>"

The encoded output displays the literal text to users while preventing script execution. Browsers render <script> as the visible characters "" rather than parsing it as a script tag.

Output encoding differs from input validation. Input validation rejects or sanitizes data at entry points, while output encoding transforms data at display points. Both techniques serve complementary purposes in defense-in-depth security strategies. Validation cannot anticipate all contexts where data might eventually appear, making output encoding the last line of defense against injection attacks.

Key Principles

Output encoding operates on the principle of context-aware transformation. Each output context defines a set of special characters that require encoding. The encoder transforms these characters into representations that lose their special meaning in that specific context while preserving their display value.

Context Determination

The encoding strategy depends entirely on where the data appears in the output. HTML element content, HTML attributes, JavaScript strings, JSON responses, CSS values, and URLs each constitute distinct contexts with different special characters and encoding requirements. Mismatched encoding applies the wrong transformation, leaving applications vulnerable despite encoding attempts.

Consider inserting user data into an HTML attribute versus an onclick handler:

# HTML attribute context
user_data = 'value"onclick="alert(1)'
safe_attr = CGI.escapeHTML(user_data)
# <input value="#{safe_attr}">
# => <input value="value&quot;onclick=&quot;alert(1)">

# JavaScript context - HTML encoding insufficient
user_data = "'; alert(1); '"
html_escaped = CGI.escapeHTML(user_data)
# <div onclick="doSomething('#{html_escaped}')">
# Still vulnerable! Needs JavaScript escaping

Character Transformation

Encoding replaces special characters with safe equivalents that browsers or parsers treat as literal data. HTML encoding uses named character references (<) or numeric character references (<). JavaScript encoding uses escape sequences (\u003c). URL encoding uses percent-encoding (%3C). Each system interprets these representations as literal characters rather than control characters.

The transformation must be complete and consistent. Partial encoding creates bypasses where attackers craft inputs that slip through incomplete transformations. Some encoders only handle ASCII characters, leaving Unicode-based attacks unaddressed. Robust encoding handles the full Unicode range and accounts for character composition and normalization.

Bidirectional Safety

Encoding maintains bidirectional safety: the encoded value can be decoded back to the original without data loss, and the encoded form cannot be interpreted as code in the target context. This property allows applications to store raw data while ensuring safe display across contexts.

original = "<script>alert('test')</script>"
encoded = CGI.escapeHTML(original)
decoded = CGI.unescapeHTML(encoded)
# decoded == original
# But encoded is safe in HTML context

Minimal Transformation

Effective encoding transforms only what the specific context requires. Over-encoding reduces readability and may cause display issues. Under-encoding leaves vulnerabilities. The encoder should transform the minimal set of characters that could trigger code execution or parsing errors in the target context.

Encoding Position

Encoding must occur at the output boundary immediately before rendering to the target context. Encoding too early means data passes through multiple processing stages where it might be decoded, manipulated, or combined with other data. Encoding at the last possible moment ensures the transformation applies to the final data state.

Ruby Implementation

Ruby provides multiple encoding mechanisms through standard library modules and framework-specific helpers. The choice depends on the output context and whether the application uses a framework like Rails.

CGI Module

The CGI module offers basic HTML encoding through CGI.escapeHTML and CGI.unescapeHTML:

require 'cgi'

user_input = %q{<img src=x onerror="alert('XSS')">}
safe_output = CGI.escapeHTML(user_input)
# => "&lt;img src=x onerror=&quot;alert(&#39;XSS&#39;)&quot;&gt;"

# Decoding when needed
original = CGI.unescapeHTML(safe_output)
# => "<img src=x onerror=\"alert('XSS')\">"

The method transforms five characters critical for HTML safety: <, >, &, ", and '. This covers the minimal set needed to prevent tag injection and attribute breaking.

ERB Escaping

ERB templates provide automatic escaping in Rails when using <%= tags with SafeBuffer objects:

require 'erb'
include ERB::Util

user_content = "<script>malicious()</script>"
safe_content = html_escape(user_content)
# => "&lt;script&gt;malicious()&lt;/script&gt;"

# Alias available
safe_content = h(user_content)

Rails templates automatically escape output by default. The h helper explicitly escapes strings, while raw bypasses escaping for trusted content:

# In Rails view - automatically escaped
<%= user.bio %>

# Explicit escaping
<%= h(user.bio) %>

# Bypass escaping - dangerous unless content is trusted
<%= raw(admin_generated_html) %>

URL Encoding

URI module handles URL encoding for query parameters and path segments:

require 'uri'

user_search = "category=books&author=o'reilly"
encoded = URI.encode_www_form_component(user_search)
# => "category%3Dbooks%26author%3Do%27reilly"

url = "https://example.com/search?q=#{encoded}"

The method encodes characters that have special meaning in URLs, including &, =, ?, /, and space characters. Different URI methods handle different encoding needs:

# Query parameter encoding
params = {search: "ruby & rails", page: 2}
query_string = URI.encode_www_form(params)
# => "search=ruby+%26+rails&page=2"

# Path segment encoding
user_file = "report (final).pdf"
safe_path = URI.encode_uri_component(user_file)
# => "report%20(final).pdf"

JavaScript Escaping

Rails provides JavaScript escaping through escape_javascript or its alias j:

user_message = "User said: \"Hello\"\nNew line"
safe_js = escape_javascript(user_message)
# => "User said: \\\"Hello\\\"\\nNew line"

# In a Rails view
<script>
  var message = "<%= j user_message %>";
</script>

This escaping handles backslashes, quotes, newlines, and other characters that would break JavaScript string literals.

JSON Encoding

JSON encoding handles JavaScript object contexts safely:

require 'json'

user_data = {
  name: "</script><script>alert(1)</script>",
  bio: "User's \"bio\" with special chars"
}

json_output = user_data.to_json
# => {"name":"</script><script>alert(1)</script>","bio":"User's \"bio\" with special chars"}

JSON encoding automatically escapes quotes, backslashes, and control characters. The encoded JSON remains safe when embedded in JavaScript contexts:

# In Rails view - safe because JSON encoding escapes special chars
<script>
  var userData = <%= raw @user.to_json %>;
</script>

Rails Content Security Helpers

Rails provides helper methods that combine encoding with HTML generation:

# link_to automatically encodes URL parameters
<%= link_to "Search", search_path(q: user_query) %>

# content_tag encodes content
<%= content_tag :div, user_content, class: "message" %>

# Safe concatenation
safe_html = safe_join([
  content_tag(:h1, user.name),
  content_tag(:p, user.bio)
])

These helpers apply appropriate encoding for their specific HTML generation contexts.

Security Implications

Output encoding serves as the primary defense against cross-site scripting attacks. XSS occurs when an application includes unencoded user data in web pages, allowing attackers to inject malicious scripts that execute in victims' browsers.

Attack Vectors Mitigated

Reflected XSS attacks occur when applications immediately echo user input without encoding. An attacker crafts a malicious URL containing JavaScript, and when victims click the link, the application reflects the script into the page where it executes:

# Vulnerable code
get '/search' do
  query = params[:q]
  "<html><body>Results for: #{query}</body></html>"
end

# Attack URL: /search?q=<script>steal_cookies()</script>
# Page displays: Results for: <script>steal_cookies()</script>
# Script executes in victim's browser

# Protected code
get '/search' do
  query = CGI.escapeHTML(params[:q])
  "<html><body>Results for: #{query}</body></html>"
end

# Same attack URL now displays safely:
# Results for: &lt;script&gt;steal_cookies()&lt;/script&gt;

Stored XSS attacks persist malicious data in databases and execute when the application displays that data to users. Without output encoding, stored profile information, comments, or messages can contain executable scripts:

# Storing user data - no encoding needed at storage
user = User.create(bio: params[:bio])

# Displaying without encoding - vulnerable
def show_profile
  "<div class='bio'>#{user.bio}</div>"
end

# Displaying with encoding - safe
def show_profile
  "<div class='bio'>#{CGI.escapeHTML(user.bio)}</div>"
end

DOM-based XSS exploits client-side JavaScript that inserts unencoded data into the page. Server-side output encoding helps by ensuring data embedded in page scripts arrives pre-encoded:

# Vulnerable pattern
<script>
  var userName = "<%= @user.name %>";
  document.getElementById('greeting').innerHTML = "Hello " + userName;
</script>

# Protected pattern
<script>
  var userName = "<%= escape_javascript(@user.name) %>";
  document.getElementById('greeting').textContent = "Hello " + userName;
</script>

Context-Specific Vulnerabilities

Encoding mismatches create vulnerabilities despite encoding attempts. HTML encoding protects HTML contexts but fails in JavaScript, CSS, or URL contexts:

# HTML encoding in JavaScript context - insufficient
user_data = "'; alert(1); var x='"
html_safe = CGI.escapeHTML(user_data)

# Still vulnerable because HTML encoding doesn't escape single quotes for JS
<script>
  var data = '#{html_safe}';
  processData(data);
</script>
# Becomes: var data = ''; alert(1); var x='';

Double Encoding Issues

Applying multiple encoding layers creates double-encoding where encoded entities get encoded again, displaying encoded characters instead of the original content:

user_input = "AT&T"
first_encoding = CGI.escapeHTML(user_input)
# => "AT&amp;T"

# Encoding again - double encoding
second_encoding = CGI.escapeHTML(first_encoding)
# => "AT&amp;amp;T"

# Displays: AT&amp;T (showing the encoded entities)
# Instead of: AT&T

Applications must track data state to avoid redundant encoding. Rails addresses this with html_safe marking and SafeBuffer objects that prevent double encoding.

Incomplete Character Set Coverage

Encoding implementations that only handle ASCII leave Unicode-based attacks unaddressed. Browsers interpret various Unicode characters as script or HTML:

# UTF-7 based attack example
attack = "+ADw-script+AD4-alert(1)+ADw-/script+AD4-"

# If browser interprets as UTF-7 and encoding doesn't handle it
# Becomes: <script>alert(1)</script>

Robust encoding handles the full Unicode range and normalizes character compositions before encoding.

Trust Boundaries

Output encoding assumes all external data is untrusted. This includes user input, database content, API responses, file contents, and environment variables. Even data from internal sources requires encoding if that data originated externally at any point:

# Database content requires encoding despite being "internal"
comment = Comment.find(params[:id])
safe_display = CGI.escapeHTML(comment.text)

# API responses require encoding
api_data = HTTParty.get('https://api.example.com/data')
safe_value = CGI.escapeHTML(api_data['user_supplied_field'])

Practical Examples

Web Form Display

Displaying user-submitted form data requires HTML encoding to prevent stored XSS:

class CommentsController < ApplicationController
  def create
    @comment = Comment.new(comment_params)
    if @comment.save
      redirect_to post_path(@comment.post_id)
    else
      render :new
    end
  end

  def show
    @comment = Comment.find(params[:id])
  end
end

# app/views/comments/show.html.erb
<div class="comment">
  <div class="author">
    <%= @comment.author_name %>
  </div>
  <div class="content">
    <%= @comment.text %>
  </div>
</div>

# Rails automatically encodes @comment.author_name and @comment.text
# If author_name contains: <b>Bold Name</b>
# Displays: &lt;b&gt;Bold Name&lt;/b&gt; (visible as "<b>Bold Name</b>")

Search Results with Query Display

Search interfaces often display the search query to users. Without encoding, reflected XSS vulnerabilities occur:

class SearchController < ApplicationController
  def results
    @query = params[:q]
    @results = Product.where("name LIKE ?", "%#{@query}%")
  end
end

# app/views/search/results.html.erb
<h2>Search results for: <%= @query %></h2>

<% @results.each do |product| %>
  <div class="product">
    <h3><%= product.name %></h3>
    <p><%= product.description %></p>
  </div>
<% end %>

# Query: <script>alert(document.cookie)</script>
# Rails encodes: Search results for: &lt;script&gt;alert(document.cookie)&lt;/script&gt;
# Displays the literal text safely

Building URLs with User Data

Constructing URLs with user-supplied parameters requires URL encoding:

class ReportsController < ApplicationController
  def download
    report_type = params[:type]
    date_range = params[:range]
    
    # URL encode parameters for redirect
    encoded_type = URI.encode_www_form_component(report_type)
    encoded_range = URI.encode_www_form_component(date_range)
    
    redirect_to "/api/reports?type=#{encoded_type}&range=#{encoded_range}"
  end
end

# Safer approach using URI builder
class ReportsController < ApplicationController
  def download
    uri = URI('https://api.example.com/reports')
    uri.query = URI.encode_www_form(
      type: params[:type],
      range: params[:range],
      format: 'pdf'
    )
    
    redirect_to uri.to_s
  end
end

Dynamic JavaScript with Server Data

Embedding server-side data in JavaScript requires JavaScript-specific encoding:

class DashboardController < ApplicationController
  def show
    @user_preferences = current_user.preferences.to_json
    @recent_activity = current_user.activities.limit(10)
  end
end

# app/views/dashboard/show.html.erb
<script>
  // Safe JSON embedding
  var preferences = <%= raw @user_preferences %>;
  
  // Safe string embedding with JavaScript escaping
  <% @recent_activity.each do |activity| %>
    displayActivity({
      message: "<%= j activity.message %>",
      timestamp: "<%= j activity.created_at.to_s %>"
    });
  <% end %>
</script>

# activity.message: He said "Hello"
# Output: message: "He said \"Hello\""
# The escape_javascript helper (j) escapes quotes, backslashes, newlines

RSS Feed Generation

XML contexts including RSS/Atom feeds require XML encoding:

class FeedController < ApplicationController
  def rss
    @posts = Post.published.order(created_at: :desc).limit(20)
    
    respond_to do |format|
      format.rss { render layout: false }
    end
  end
end

# app/views/feed/rss.rss.builder
xml.instruct! :xml, version: "1.0"
xml.rss version: "2.0" do
  xml.channel do
    xml.title "Site Blog"
    xml.description "Latest posts"
    
    @posts.each do |post|
      xml.item do
        xml.title post.title
        xml.description post.summary
        xml.pubDate post.created_at.to_s(:rfc822)
        xml.link post_url(post)
      end
    end
  end
end

# Builder automatically handles XML encoding
# post.title: "Ruby & Rails <Tips>"
# Encoded to: Ruby &amp; Rails &lt;Tips&gt;

CSV Export with User Content

CSV exports containing user data require CSV-specific escaping:

require 'csv'

class UsersController < ApplicationController
  def export
    @users = User.all
    
    respond_to do |format|
      format.csv do
        csv_string = CSV.generate do |csv|
          csv << ["Name", "Email", "Bio"]
          
          @users.each do |user|
            csv << [user.name, user.email, user.bio]
          end
        end
        
        send_data csv_string, filename: "users-#{Date.today}.csv"
      end
    end
  end
end

# CSV library automatically handles encoding
# user.bio: "Loves coding, "Ruby" & Rails"
# Encoded: "Loves coding, ""Ruby"" & Rails"
# Double quotes escaped as double-double quotes

Common Pitfalls

Wrong Context Encoding

Applying HTML encoding to non-HTML contexts leaves vulnerabilities. Each context requires appropriate encoding:

# Pitfall: HTML encoding in JavaScript context
user_input = "'; maliciousFunction(); '"
html_escaped = CGI.escapeHTML(user_input)

# Still vulnerable
<script>
  var data = '#{html_escaped}';
</script>
# Becomes: var data = ''; maliciousFunction(); '';

# Correct: JavaScript escaping
js_escaped = escape_javascript(user_input)
<script>
  var data = '<%= js_escaped %>';
</script>
# Becomes: var data = '\'; maliciousFunction(); \'';

Encoding Inside Attributes Without Quotes

HTML attributes without quotes create injection points despite encoding:

# Vulnerable: unquoted attribute with HTML encoding
user_class = "highlight' onload='alert(1)"
encoded = CGI.escapeHTML(user_class)

# Still vulnerable due to missing quotes
"<div class=#{encoded}></div>"
# Renders: <div class=highlight' onload='alert(1)></div>
# Browser parses onload as separate attribute

# Correct: always quote attributes
"<div class=\"#{encoded}\"></div>"
# Renders: <div class="highlight&#39; onload=&#39;alert(1)"></div>
# Browser treats entire value as class attribute

Late String Interpolation with Already-Escaped Content

Marking strings as HTML safe too early bypasses encoding:

# Pitfall: marking as html_safe before final output
def format_user_content(text)
  CGI.escapeHTML(text).html_safe
end

def render_comment(comment)
  content = format_user_content(comment.text)
  # Later interpolation doesn't re-encode
  "<div class='comment'>#{content}</div>".html_safe
end

# Correct: keep strings unsafe until final template rendering
def format_user_content(text)
  CGI.escapeHTML(text)  # Don't mark as html_safe
end

def render_comment(comment)
  content = format_user_content(comment.text)
  content_tag(:div, content.html_safe, class: 'comment')
end

Forgetting URL Parameter Encoding

Building URLs by string concatenation without encoding creates malformed URLs and injection points:

# Pitfall: unencoded URL parameters
user_query = "category=books&sort=price"
url = "https://shop.example.com/search?q=#{user_query}"
# Result: /search?q=category=books&sort=price
# Server sees three parameters: q=category, books (no value), sort=price

# Correct: encode parameters
encoded_query = URI.encode_www_form_component(user_query)
url = "https://shop.example.com/search?q=#{encoded_query}"
# Result: /search?q=category%3Dbooks%26sort%3Dprice
# Server correctly sees one parameter: q=category=books&sort=price

Inconsistent Encoding Across Code Paths

Some code paths encoding while others skip encoding creates security gaps:

# Pitfall: inconsistent encoding
class PostsController < ApplicationController
  def show
    @post = Post.find(params[:id])
    
    if @post.featured?
      render :featured_template  # Has manual encoding
    else
      render :standard_template  # Missing encoding
    end
  end
end

# Correct: consistent encoding everywhere
class PostsController < ApplicationController
  def show
    @post = Post.find(params[:id])
    # Let Rails auto-escape in both templates
  end
end

# Both templates use <%= %> for automatic escaping

Using raw for User Content

Bypassing Rails automatic escaping with raw for user-generated content exposes XSS:

# Pitfall: using raw with user content
<div class="bio">
  <%= raw @user.bio %>
</div>

# If @user.bio contains: <img src=x onerror="alert(1)">
# Script executes

# Correct: let Rails auto-escape or explicitly encode
<div class="bio">
  <%= @user.bio %>
</div>

# Or if you need HTML from trusted source
<div class="bio">
  <%= sanitize(@admin_approved_bio) %>
</div>

Not Handling Encoding Errors

Invalid byte sequences in strings cause encoding errors:

# Pitfall: not handling encoding errors
def display_uploaded_file(file_content)
  CGI.escapeHTML(file_content)
end

# If file_content has invalid UTF-8 bytes
# Raises Encoding::CompatibilityError

# Correct: handle encoding issues
def display_uploaded_file(file_content)
  safe_content = file_content.encode(
    'UTF-8',
    invalid: :replace,
    undef: :replace,
    replace: '?'
  )
  CGI.escapeHTML(safe_content)
end

Encoding Loss Through Database Operations

Some database operations or storage mechanisms alter encoded strings:

# Pitfall: encoding before storage
user_input = "<script>test</script>"
encoded = CGI.escapeHTML(user_input)
User.create(bio: encoded)  # Stores encoded HTML

# Later retrieval displays double-encoded
user = User.find(id)
display = CGI.escapeHTML(user.bio)
# Shows: &amp;lt;script&amp;gt;test&amp;lt;/script&amp;gt;

# Correct: store raw, encode at display
user_input = "<script>test</script>"
User.create(bio: user_input)  # Store raw

# Encode when displaying
user = User.find(id)
display = CGI.escapeHTML(user.bio)
# Shows: &lt;script&gt;test&lt;/script&gt;

Reference

HTML Encoding Methods

Method	Context	Special Characters	Example
CGI.escapeHTML	HTML element content	< > & " '	User display text
ERB::Util.html_escape	HTML element content	< > & " '	Template output
ERB::Util.h	HTML element content	< > & " '	Template shorthand
ActionView sanitize	HTML with limited tags	Removes dangerous tags/attrs	User-formatted content

URL Encoding Methods

Method	Context	Use Case
URI.encode_www_form	Full query string	Parameter name-value pairs
URI.encode_www_form_component	Single query value	Individual parameter value
URI.encode_uri_component	Path segment	URL path components
CGI.escape	Query string value	Legacy URL encoding

JavaScript Encoding

Method	Context	Characters Handled
escape_javascript	JavaScript string literals	Backslash, quotes, newlines, tags
JSON.generate	JavaScript objects	Quotes, backslash, control chars
to_json	JavaScript objects	All JSON-unsafe characters

Context-Specific Character Requirements

Context	Required Encoding	Critical Characters
HTML element	HTML entities	< > &
HTML attribute	HTML entities + quotes	< > & " '
JavaScript string	JavaScript escapes	\ " ' newline
URL query	Percent encoding	& = ? # / space
URL path	Percent encoding	/ ? # space
CSS value	CSS escapes	Quotes, newlines, backslash
JSON string	JSON escapes	\ " control characters
XML content	XML entities	< > & " '

Rails Helper Methods

Helper	Purpose	Encoding Applied
link_to	Generate links	URL parameters
content_tag	Generate HTML tags	Element content
text_field	Form inputs	Attribute values
text_area	Form textareas	Element content
select	Form selects	Option text and values
raw	Bypass encoding	None - marks as safe
safe_join	Join safe strings	None on join, expects pre-encoded

Encoding Decision Matrix

Data Source	Display Context	Required Encoding
User input	HTML element	CGI.escapeHTML
User input	HTML attribute	CGI.escapeHTML + quoted attribute
User input	JavaScript string	escape_javascript
User input	URL parameter	URI.encode_www_form_component
Database content	HTML element	CGI.escapeHTML
API response	JSON	to_json
File upload name	URL path	URI.encode_uri_component
User input	CSS value	Custom CSS escaping

Common XSS Attack Patterns

Attack Vector	Malicious Input	Without Encoding	With Encoding
Tag injection	<script>alert(1)</script>	Executes script	Displays literal text
Attribute breaking	" onclick="alert(1)	Adds event handler	Encodes quote, prevents break
JavaScript breaking	'; alert(1); '	Breaks string, runs code	Escapes quote
URL injection	javascript:alert(1)	Executes JavaScript	Encoded, inert
CSS expression	expression(alert(1))	IE executes	Context prevents parsing

Character Entity Reference

Character	HTML Entity	Numeric Entity	URL Encoded	JavaScript
<	<	<	%3C	\u003c
>	>	>	%3E	\u003e
&	&	&	%26	\u0026
"	"	"	%22	"
'	'	'	%27	'
/	/	/	%2F	/
Space			%20 or +	\u0020

Ruby String Encoding Methods

Method	Purpose	Example
encode	Convert encoding	str.encode('UTF-8')
force_encoding	Change encoding label	str.force_encoding('ASCII-8BIT')
valid_encoding?	Check encoding validity	str.valid_encoding?
scrub	Replace invalid bytes	str.scrub('?')
encoding	Get current encoding	str.encoding.name

Safe Buffer Detection

Check	Purpose	Example
html_safe?	Test if marked safe	str.html_safe?
html_safe	Mark string as safe	str.html_safe
to_str	Convert SafeBuffer to String	safe_str.to_str

Output Encoding