Overview
Output encoding converts special characters in data into their safe encoded equivalents before rendering content to users. This transformation prevents browsers, databases, or other systems from interpreting user-supplied data as executable code or commands. The encoding process maintains the display value while neutralizing potentially malicious content.
When an application displays user input without encoding, attackers can inject malicious scripts that execute in other users' browsers, steal credentials, manipulate DOM elements, or perform unauthorized actions. Output encoding breaks this attack vector by ensuring user data remains data rather than becoming executable code.
The encoding mechanism varies by context. HTML contexts require different encoding than JavaScript contexts, which differ from URL contexts. Each context has specific characters that carry special meaning and require transformation. The character < becomes < in HTML but \u003c in JavaScript strings, and %3C in URLs.
# Without encoding - vulnerable to XSS
user_input = "<script>alert('XSS')</script>"
html = "<div>#{user_input}</div>"
# => "<div><script>alert('XSS')</script></div>"
# With HTML encoding - safe
require 'cgi'
encoded = CGI.escapeHTML(user_input)
html = "<div>#{encoded}</div>"
# => "<div><script>alert('XSS')</script></div>"
The encoded output displays the literal text to users while preventing script execution. Browsers render <script> as the visible characters "" rather than parsing it as a script tag.
Output encoding differs from input validation. Input validation rejects or sanitizes data at entry points, while output encoding transforms data at display points. Both techniques serve complementary purposes in defense-in-depth security strategies. Validation cannot anticipate all contexts where data might eventually appear, making output encoding the last line of defense against injection attacks.
Key Principles
Output encoding operates on the principle of context-aware transformation. Each output context defines a set of special characters that require encoding. The encoder transforms these characters into representations that lose their special meaning in that specific context while preserving their display value.
Context Determination
The encoding strategy depends entirely on where the data appears in the output. HTML element content, HTML attributes, JavaScript strings, JSON responses, CSS values, and URLs each constitute distinct contexts with different special characters and encoding requirements. Mismatched encoding applies the wrong transformation, leaving applications vulnerable despite encoding attempts.
Consider inserting user data into an HTML attribute versus an onclick handler:
# HTML attribute context
user_data = 'value"onclick="alert(1)'
safe_attr = CGI.escapeHTML(user_data)
# <input value="#{safe_attr}">
# => <input value="value"onclick="alert(1)">
# JavaScript context - HTML encoding insufficient
user_data = "'; alert(1); '"
html_escaped = CGI.escapeHTML(user_data)
# <div onclick="doSomething('#{html_escaped}')">
# Still vulnerable! Needs JavaScript escaping
Character Transformation
Encoding replaces special characters with safe equivalents that browsers or parsers treat as literal data. HTML encoding uses named character references (<) or numeric character references (<). JavaScript encoding uses escape sequences (\u003c). URL encoding uses percent-encoding (%3C). Each system interprets these representations as literal characters rather than control characters.
The transformation must be complete and consistent. Partial encoding creates bypasses where attackers craft inputs that slip through incomplete transformations. Some encoders only handle ASCII characters, leaving Unicode-based attacks unaddressed. Robust encoding handles the full Unicode range and accounts for character composition and normalization.
Bidirectional Safety
Encoding maintains bidirectional safety: the encoded value can be decoded back to the original without data loss, and the encoded form cannot be interpreted as code in the target context. This property allows applications to store raw data while ensuring safe display across contexts.
original = "<script>alert('test')</script>"
encoded = CGI.escapeHTML(original)
decoded = CGI.unescapeHTML(encoded)
# decoded == original
# But encoded is safe in HTML context
Minimal Transformation
Effective encoding transforms only what the specific context requires. Over-encoding reduces readability and may cause display issues. Under-encoding leaves vulnerabilities. The encoder should transform the minimal set of characters that could trigger code execution or parsing errors in the target context.
Encoding Position
Encoding must occur at the output boundary immediately before rendering to the target context. Encoding too early means data passes through multiple processing stages where it might be decoded, manipulated, or combined with other data. Encoding at the last possible moment ensures the transformation applies to the final data state.
Ruby Implementation
Ruby provides multiple encoding mechanisms through standard library modules and framework-specific helpers. The choice depends on the output context and whether the application uses a framework like Rails.
CGI Module
The CGI module offers basic HTML encoding through CGI.escapeHTML and CGI.unescapeHTML:
require 'cgi'
user_input = %q{<img src=x onerror="alert('XSS')">}
safe_output = CGI.escapeHTML(user_input)
# => "<img src=x onerror="alert('XSS')">"
# Decoding when needed
original = CGI.unescapeHTML(safe_output)
# => "<img src=x onerror=\"alert('XSS')\">"
The method transforms five characters critical for HTML safety: <, >, &, ", and '. This covers the minimal set needed to prevent tag injection and attribute breaking.
ERB Escaping
ERB templates provide automatic escaping in Rails when using <%= tags with SafeBuffer objects:
require 'erb'
include ERB::Util
user_content = "<script>malicious()</script>"
safe_content = html_escape(user_content)
# => "<script>malicious()</script>"
# Alias available
safe_content = h(user_content)
Rails templates automatically escape output by default. The h helper explicitly escapes strings, while raw bypasses escaping for trusted content:
# In Rails view - automatically escaped
<%= user.bio %>
# Explicit escaping
<%= h(user.bio) %>
# Bypass escaping - dangerous unless content is trusted
<%= raw(admin_generated_html) %>
URL Encoding
URI module handles URL encoding for query parameters and path segments:
require 'uri'
user_search = "category=books&author=o'reilly"
encoded = URI.encode_www_form_component(user_search)
# => "category%3Dbooks%26author%3Do%27reilly"
url = "https://example.com/search?q=#{encoded}"
The method encodes characters that have special meaning in URLs, including &, =, ?, /, and space characters. Different URI methods handle different encoding needs:
# Query parameter encoding
params = {search: "ruby & rails", page: 2}
query_string = URI.encode_www_form(params)
# => "search=ruby+%26+rails&page=2"
# Path segment encoding
user_file = "report (final).pdf"
safe_path = URI.encode_uri_component(user_file)
# => "report%20(final).pdf"
JavaScript Escaping
Rails provides JavaScript escaping through escape_javascript or its alias j:
user_message = "User said: \"Hello\"\nNew line"
safe_js = escape_javascript(user_message)
# => "User said: \\\"Hello\\\"\\nNew line"
# In a Rails view
<script>
var message = "<%= j user_message %>";
</script>
This escaping handles backslashes, quotes, newlines, and other characters that would break JavaScript string literals.
JSON Encoding
JSON encoding handles JavaScript object contexts safely:
require 'json'
user_data = {
name: "</script><script>alert(1)</script>",
bio: "User's \"bio\" with special chars"
}
json_output = user_data.to_json
# => {"name":"</script><script>alert(1)</script>","bio":"User's \"bio\" with special chars"}
JSON encoding automatically escapes quotes, backslashes, and control characters. The encoded JSON remains safe when embedded in JavaScript contexts:
# In Rails view - safe because JSON encoding escapes special chars
<script>
var userData = <%= raw @user.to_json %>;
</script>
Rails Content Security Helpers
Rails provides helper methods that combine encoding with HTML generation:
# link_to automatically encodes URL parameters
<%= link_to "Search", search_path(q: user_query) %>
# content_tag encodes content
<%= content_tag :div, user_content, class: "message" %>
# Safe concatenation
safe_html = safe_join([
content_tag(:h1, user.name),
content_tag(:p, user.bio)
])
These helpers apply appropriate encoding for their specific HTML generation contexts.
Security Implications
Output encoding serves as the primary defense against cross-site scripting attacks. XSS occurs when an application includes unencoded user data in web pages, allowing attackers to inject malicious scripts that execute in victims' browsers.
Attack Vectors Mitigated
Reflected XSS attacks occur when applications immediately echo user input without encoding. An attacker crafts a malicious URL containing JavaScript, and when victims click the link, the application reflects the script into the page where it executes:
# Vulnerable code
get '/search' do
query = params[:q]
"<html><body>Results for: #{query}</body></html>"
end
# Attack URL: /search?q=<script>steal_cookies()</script>
# Page displays: Results for: <script>steal_cookies()</script>
# Script executes in victim's browser
# Protected code
get '/search' do
query = CGI.escapeHTML(params[:q])
"<html><body>Results for: #{query}</body></html>"
end
# Same attack URL now displays safely:
# Results for: <script>steal_cookies()</script>
Stored XSS attacks persist malicious data in databases and execute when the application displays that data to users. Without output encoding, stored profile information, comments, or messages can contain executable scripts:
# Storing user data - no encoding needed at storage
user = User.create(bio: params[:bio])
# Displaying without encoding - vulnerable
def show_profile
"<div class='bio'>#{user.bio}</div>"
end
# Displaying with encoding - safe
def show_profile
"<div class='bio'>#{CGI.escapeHTML(user.bio)}</div>"
end
DOM-based XSS exploits client-side JavaScript that inserts unencoded data into the page. Server-side output encoding helps by ensuring data embedded in page scripts arrives pre-encoded:
# Vulnerable pattern
<script>
var userName = "<%= @user.name %>";
document.getElementById('greeting').innerHTML = "Hello " + userName;
</script>
# Protected pattern
<script>
var userName = "<%= escape_javascript(@user.name) %>";
document.getElementById('greeting').textContent = "Hello " + userName;
</script>
Context-Specific Vulnerabilities
Encoding mismatches create vulnerabilities despite encoding attempts. HTML encoding protects HTML contexts but fails in JavaScript, CSS, or URL contexts:
# HTML encoding in JavaScript context - insufficient
user_data = "'; alert(1); var x='"
html_safe = CGI.escapeHTML(user_data)
# Still vulnerable because HTML encoding doesn't escape single quotes for JS
<script>
var data = '#{html_safe}';
processData(data);
</script>
# Becomes: var data = ''; alert(1); var x='';
Double Encoding Issues
Applying multiple encoding layers creates double-encoding where encoded entities get encoded again, displaying encoded characters instead of the original content:
user_input = "AT&T"
first_encoding = CGI.escapeHTML(user_input)
# => "AT&T"
# Encoding again - double encoding
second_encoding = CGI.escapeHTML(first_encoding)
# => "AT&amp;T"
# Displays: AT&T (showing the encoded entities)
# Instead of: AT&T
Applications must track data state to avoid redundant encoding. Rails addresses this with html_safe marking and SafeBuffer objects that prevent double encoding.
Incomplete Character Set Coverage
Encoding implementations that only handle ASCII leave Unicode-based attacks unaddressed. Browsers interpret various Unicode characters as script or HTML:
# UTF-7 based attack example
attack = "+ADw-script+AD4-alert(1)+ADw-/script+AD4-"
# If browser interprets as UTF-7 and encoding doesn't handle it
# Becomes: <script>alert(1)</script>
Robust encoding handles the full Unicode range and normalizes character compositions before encoding.
Trust Boundaries
Output encoding assumes all external data is untrusted. This includes user input, database content, API responses, file contents, and environment variables. Even data from internal sources requires encoding if that data originated externally at any point:
# Database content requires encoding despite being "internal"
comment = Comment.find(params[:id])
safe_display = CGI.escapeHTML(comment.text)
# API responses require encoding
api_data = HTTParty.get('https://api.example.com/data')
safe_value = CGI.escapeHTML(api_data['user_supplied_field'])
Practical Examples
Web Form Display
Displaying user-submitted form data requires HTML encoding to prevent stored XSS:
class CommentsController < ApplicationController
def create
@comment = Comment.new(comment_params)
if @comment.save
redirect_to post_path(@comment.post_id)
else
render :new
end
end
def show
@comment = Comment.find(params[:id])
end
end
# app/views/comments/show.html.erb
<div class="comment">
<div class="author">
<%= @comment.author_name %>
</div>
<div class="content">
<%= @comment.text %>
</div>
</div>
# Rails automatically encodes @comment.author_name and @comment.text
# If author_name contains: <b>Bold Name</b>
# Displays: <b>Bold Name</b> (visible as "<b>Bold Name</b>")
Search Results with Query Display
Search interfaces often display the search query to users. Without encoding, reflected XSS vulnerabilities occur:
class SearchController < ApplicationController
def results
@query = params[:q]
@results = Product.where("name LIKE ?", "%#{@query}%")
end
end
# app/views/search/results.html.erb
<h2>Search results for: <%= @query %></h2>
<% @results.each do |product| %>
<div class="product">
<h3><%= product.name %></h3>
<p><%= product.description %></p>
</div>
<% end %>
# Query: <script>alert(document.cookie)</script>
# Rails encodes: Search results for: <script>alert(document.cookie)</script>
# Displays the literal text safely
Building URLs with User Data
Constructing URLs with user-supplied parameters requires URL encoding:
class ReportsController < ApplicationController
def download
report_type = params[:type]
date_range = params[:range]
# URL encode parameters for redirect
encoded_type = URI.encode_www_form_component(report_type)
encoded_range = URI.encode_www_form_component(date_range)
redirect_to "/api/reports?type=#{encoded_type}&range=#{encoded_range}"
end
end
# Safer approach using URI builder
class ReportsController < ApplicationController
def download
uri = URI('https://api.example.com/reports')
uri.query = URI.encode_www_form(
type: params[:type],
range: params[:range],
format: 'pdf'
)
redirect_to uri.to_s
end
end
Dynamic JavaScript with Server Data
Embedding server-side data in JavaScript requires JavaScript-specific encoding:
class DashboardController < ApplicationController
def show
@user_preferences = current_user.preferences.to_json
@recent_activity = current_user.activities.limit(10)
end
end
# app/views/dashboard/show.html.erb
<script>
// Safe JSON embedding
var preferences = <%= raw @user_preferences %>;
// Safe string embedding with JavaScript escaping
<% @recent_activity.each do |activity| %>
displayActivity({
message: "<%= j activity.message %>",
timestamp: "<%= j activity.created_at.to_s %>"
});
<% end %>
</script>
# activity.message: He said "Hello"
# Output: message: "He said \"Hello\""
# The escape_javascript helper (j) escapes quotes, backslashes, newlines
RSS Feed Generation
XML contexts including RSS/Atom feeds require XML encoding:
class FeedController < ApplicationController
def rss
@posts = Post.published.order(created_at: :desc).limit(20)
respond_to do |format|
format.rss { render layout: false }
end
end
end
# app/views/feed/rss.rss.builder
xml.instruct! :xml, version: "1.0"
xml.rss version: "2.0" do
xml.channel do
xml.title "Site Blog"
xml.description "Latest posts"
@posts.each do |post|
xml.item do
xml.title post.title
xml.description post.summary
xml.pubDate post.created_at.to_s(:rfc822)
xml.link post_url(post)
end
end
end
end
# Builder automatically handles XML encoding
# post.title: "Ruby & Rails <Tips>"
# Encoded to: Ruby & Rails <Tips>
CSV Export with User Content
CSV exports containing user data require CSV-specific escaping:
require 'csv'
class UsersController < ApplicationController
def export
@users = User.all
respond_to do |format|
format.csv do
csv_string = CSV.generate do |csv|
csv << ["Name", "Email", "Bio"]
@users.each do |user|
csv << [user.name, user.email, user.bio]
end
end
send_data csv_string, filename: "users-#{Date.today}.csv"
end
end
end
end
# CSV library automatically handles encoding
# user.bio: "Loves coding, "Ruby" & Rails"
# Encoded: "Loves coding, ""Ruby"" & Rails"
# Double quotes escaped as double-double quotes
Common Pitfalls
Wrong Context Encoding
Applying HTML encoding to non-HTML contexts leaves vulnerabilities. Each context requires appropriate encoding:
# Pitfall: HTML encoding in JavaScript context
user_input = "'; maliciousFunction(); '"
html_escaped = CGI.escapeHTML(user_input)
# Still vulnerable
<script>
var data = '#{html_escaped}';
</script>
# Becomes: var data = ''; maliciousFunction(); '';
# Correct: JavaScript escaping
js_escaped = escape_javascript(user_input)
<script>
var data = '<%= js_escaped %>';
</script>
# Becomes: var data = '\'; maliciousFunction(); \'';
Encoding Inside Attributes Without Quotes
HTML attributes without quotes create injection points despite encoding:
# Vulnerable: unquoted attribute with HTML encoding
user_class = "highlight' onload='alert(1)"
encoded = CGI.escapeHTML(user_class)
# Still vulnerable due to missing quotes
"<div class=#{encoded}></div>"
# Renders: <div class=highlight' onload='alert(1)></div>
# Browser parses onload as separate attribute
# Correct: always quote attributes
"<div class=\"#{encoded}\"></div>"
# Renders: <div class="highlight' onload='alert(1)"></div>
# Browser treats entire value as class attribute
Late String Interpolation with Already-Escaped Content
Marking strings as HTML safe too early bypasses encoding:
# Pitfall: marking as html_safe before final output
def format_user_content(text)
CGI.escapeHTML(text).html_safe
end
def render_comment(comment)
content = format_user_content(comment.text)
# Later interpolation doesn't re-encode
"<div class='comment'>#{content}</div>".html_safe
end
# Correct: keep strings unsafe until final template rendering
def format_user_content(text)
CGI.escapeHTML(text) # Don't mark as html_safe
end
def render_comment(comment)
content = format_user_content(comment.text)
content_tag(:div, content.html_safe, class: 'comment')
end
Forgetting URL Parameter Encoding
Building URLs by string concatenation without encoding creates malformed URLs and injection points:
# Pitfall: unencoded URL parameters
user_query = "category=books&sort=price"
url = "https://shop.example.com/search?q=#{user_query}"
# Result: /search?q=category=books&sort=price
# Server sees three parameters: q=category, books (no value), sort=price
# Correct: encode parameters
encoded_query = URI.encode_www_form_component(user_query)
url = "https://shop.example.com/search?q=#{encoded_query}"
# Result: /search?q=category%3Dbooks%26sort%3Dprice
# Server correctly sees one parameter: q=category=books&sort=price
Inconsistent Encoding Across Code Paths
Some code paths encoding while others skip encoding creates security gaps:
# Pitfall: inconsistent encoding
class PostsController < ApplicationController
def show
@post = Post.find(params[:id])
if @post.featured?
render :featured_template # Has manual encoding
else
render :standard_template # Missing encoding
end
end
end
# Correct: consistent encoding everywhere
class PostsController < ApplicationController
def show
@post = Post.find(params[:id])
# Let Rails auto-escape in both templates
end
end
# Both templates use <%= %> for automatic escaping
Using raw for User Content
Bypassing Rails automatic escaping with raw for user-generated content exposes XSS:
# Pitfall: using raw with user content
<div class="bio">
<%= raw @user.bio %>
</div>
# If @user.bio contains: <img src=x onerror="alert(1)">
# Script executes
# Correct: let Rails auto-escape or explicitly encode
<div class="bio">
<%= @user.bio %>
</div>
# Or if you need HTML from trusted source
<div class="bio">
<%= sanitize(@admin_approved_bio) %>
</div>
Not Handling Encoding Errors
Invalid byte sequences in strings cause encoding errors:
# Pitfall: not handling encoding errors
def display_uploaded_file(file_content)
CGI.escapeHTML(file_content)
end
# If file_content has invalid UTF-8 bytes
# Raises Encoding::CompatibilityError
# Correct: handle encoding issues
def display_uploaded_file(file_content)
safe_content = file_content.encode(
'UTF-8',
invalid: :replace,
undef: :replace,
replace: '?'
)
CGI.escapeHTML(safe_content)
end
Encoding Loss Through Database Operations
Some database operations or storage mechanisms alter encoded strings:
# Pitfall: encoding before storage
user_input = "<script>test</script>"
encoded = CGI.escapeHTML(user_input)
User.create(bio: encoded) # Stores encoded HTML
# Later retrieval displays double-encoded
user = User.find(id)
display = CGI.escapeHTML(user.bio)
# Shows: &lt;script&gt;test&lt;/script&gt;
# Correct: store raw, encode at display
user_input = "<script>test</script>"
User.create(bio: user_input) # Store raw
# Encode when displaying
user = User.find(id)
display = CGI.escapeHTML(user.bio)
# Shows: <script>test</script>
Reference
HTML Encoding Methods
| Method | Context | Special Characters | Example |
|---|---|---|---|
| CGI.escapeHTML | HTML element content | < > & " ' | User display text |
| ERB::Util.html_escape | HTML element content | < > & " ' | Template output |
| ERB::Util.h | HTML element content | < > & " ' | Template shorthand |
| ActionView sanitize | HTML with limited tags | Removes dangerous tags/attrs | User-formatted content |
URL Encoding Methods
| Method | Context | Use Case |
|---|---|---|
| URI.encode_www_form | Full query string | Parameter name-value pairs |
| URI.encode_www_form_component | Single query value | Individual parameter value |
| URI.encode_uri_component | Path segment | URL path components |
| CGI.escape | Query string value | Legacy URL encoding |
JavaScript Encoding
| Method | Context | Characters Handled |
|---|---|---|
| escape_javascript | JavaScript string literals | Backslash, quotes, newlines, tags |
| JSON.generate | JavaScript objects | Quotes, backslash, control chars |
| to_json | JavaScript objects | All JSON-unsafe characters |
Context-Specific Character Requirements
| Context | Required Encoding | Critical Characters |
|---|---|---|
| HTML element | HTML entities | < > & |
| HTML attribute | HTML entities + quotes | < > & " ' |
| JavaScript string | JavaScript escapes | \ " ' newline |
| URL query | Percent encoding | & = ? # / space |
| URL path | Percent encoding | / ? # space |
| CSS value | CSS escapes | Quotes, newlines, backslash |
| JSON string | JSON escapes | \ " control characters |
| XML content | XML entities | < > & " ' |
Rails Helper Methods
| Helper | Purpose | Encoding Applied |
|---|---|---|
| link_to | Generate links | URL parameters |
| content_tag | Generate HTML tags | Element content |
| text_field | Form inputs | Attribute values |
| text_area | Form textareas | Element content |
| select | Form selects | Option text and values |
| raw | Bypass encoding | None - marks as safe |
| safe_join | Join safe strings | None on join, expects pre-encoded |
Encoding Decision Matrix
| Data Source | Display Context | Required Encoding |
|---|---|---|
| User input | HTML element | CGI.escapeHTML |
| User input | HTML attribute | CGI.escapeHTML + quoted attribute |
| User input | JavaScript string | escape_javascript |
| User input | URL parameter | URI.encode_www_form_component |
| Database content | HTML element | CGI.escapeHTML |
| API response | JSON | to_json |
| File upload name | URL path | URI.encode_uri_component |
| User input | CSS value | Custom CSS escaping |
Common XSS Attack Patterns
| Attack Vector | Malicious Input | Without Encoding | With Encoding |
|---|---|---|---|
| Tag injection | <script>alert(1)</script> | Executes script | Displays literal text |
| Attribute breaking | " onclick="alert(1) | Adds event handler | Encodes quote, prevents break |
| JavaScript breaking | '; alert(1); ' | Breaks string, runs code | Escapes quote |
| URL injection | javascript:alert(1) | Executes JavaScript | Encoded, inert |
| CSS expression | expression(alert(1)) | IE executes | Context prevents parsing |
Character Entity Reference
| Character | HTML Entity | Numeric Entity | URL Encoded | JavaScript |
|---|---|---|---|---|
| < | < | < | %3C | \u003c |
| > | > | > | %3E | \u003e |
| & | & | & | %26 | \u0026 |
| " | " | " | %22 | " |
| ' | ' | ' | %27 | ' |
| / | / | / | %2F | / |
| Space |   |   | %20 or + | \u0020 |
Ruby String Encoding Methods
| Method | Purpose | Example |
|---|---|---|
| encode | Convert encoding | str.encode('UTF-8') |
| force_encoding | Change encoding label | str.force_encoding('ASCII-8BIT') |
| valid_encoding? | Check encoding validity | str.valid_encoding? |
| scrub | Replace invalid bytes | str.scrub('?') |
| encoding | Get current encoding | str.encoding.name |
Safe Buffer Detection
| Check | Purpose | Example |
|---|---|---|
| html_safe? | Test if marked safe | str.html_safe? |
| html_safe | Mark string as safe | str.html_safe |
| to_str | Convert SafeBuffer to String | safe_str.to_str |