CrackedRuby logo

CrackedRuby

String Literals

Overview

Ruby provides several string literal syntaxes that create String objects with varying behaviors for interpolation, escape sequences, and delimiter handling. The primary forms include single-quoted strings, double-quoted strings, percent literals, and heredoc syntax.

Single-quoted strings interpret only the escape sequences \' and \\, treating all other characters literally. Double-quoted strings support full escape sequence processing and string interpolation through #{} syntax. Percent literals use customizable delimiters and follow double-quote interpolation rules, while heredocs provide multi-line string creation with configurable indentation and interpolation behavior.

single = 'literal text with minimal escaping'
double = "interpolated text with #{variable} and \n escapes"
percent = %Q{custom delimited string with #{interpolation}}

String literals create new String objects each time they execute, unless frozen with the frozen string literal pragma. Ruby processes these literals at parse time for syntax validation, then creates the actual String objects during execution.

The encoding of string literals depends on the source file encoding, defaultable to UTF-8, with override capability through encoding comments. Different literal forms share the same underlying String class but vary in their compile-time processing behavior.

Basic Usage

Single-quoted strings provide literal text with minimal escape processing. Only backslash-quote (\') and backslash-backslash (\\) sequences receive interpretation, making single quotes ideal for strings containing many backslashes or special characters.

path = 'C:\Users\name\file.txt'
regex = 'pattern with \d+ and \w* without escaping'
message = 'String with \'embedded quotes\' stays readable'
# => "String with 'embedded quotes' stays readable"

Double-quoted strings enable interpolation and full escape sequence processing. String interpolation executes Ruby expressions within #{} and converts results to strings using to_s.

name = "Alice"
age = 30
greeting = "Hello, #{name}! You are #{age} years old."
# => "Hello, Alice! You are 30 years old."

formatted = "Line 1\nLine 2\tTabbed content"
puts formatted
# Line 1
# Line 2    Tabbed content

Percent literals use % followed by a delimiter character, supporting various quote-like behaviors. The %Q form behaves like double quotes with interpolation, while %q behaves like single quotes without interpolation.

mixed_quotes = %Q{String with "double" and 'single' quotes}
literal_percent = %q!Raw string with #{no_interpolation} preserved!
custom_delimiter = %(Parentheses as delimiters work too)

Heredoc syntax creates multi-line strings using << followed by an identifier. The string content continues until a line containing only the identifier appears. Heredocs support interpolation by default, unless the identifier appears in single quotes.

sql_query = <<~SQL
  SELECT users.name, profiles.bio
  FROM users
  JOIN profiles ON users.id = profiles.user_id
  WHERE users.active = true
SQL

template = <<~HTML
  <div class="user">
    <h1>#{user.name}</h1>
    <p>#{user.description}</p>
  </div>
HTML

Advanced Usage

Heredoc squiggly syntax (<<~) removes leading whitespace from each line based on the least-indented line, enabling clean multi-line strings within indented code blocks.

class EmailTemplate
  def welcome_message(user)
    <<~MESSAGE
      Dear #{user.name},
      
      Welcome to our service! Your account has been created with
      the email address #{user.email}.
      
      Best regards,
      The Team
    MESSAGE
  end
end

Percent notation supports multiple delimiters and alternative forms. Each form serves specific use cases where certain characters appear frequently in the string content.

# Different delimiters for different content types
json_template = %Q{"name": "#{user.name}", "active": #{user.active?}}
regex_pattern = %r{/api/v\d+/users/\d+}
file_path = %q{C:\Program Files\Application\config.ini}
shell_command = %x{ls -la #{directory}}

# Nested delimiters work when balanced
nested = %Q{Outer (contains (nested) parentheses) structure}

Frozen string literals reduce memory allocation by reusing identical string objects. The frozen_string_literal: true pragma applies to all string literals in the file.

# frozen_string_literal: true

def process_data
  status = "processing"  # Same object reused each call
  log_message = "Data processing started at #{Time.now}"  # New object each call due to interpolation
end

# Explicit freezing for individual strings
CONSTANT_MESSAGE = "System initialized".freeze

String literals support method chaining directly on the literal syntax, enabling concise string processing pipelines.

processed = "  MIXED case text  "
  .strip
  .downcase
  .gsub(/\s+/, "_")
  .capitalize
# => "Mixed_case_text"

formatted_list = %w[apple banana cherry]
  .map(&:capitalize)
  .join(", ")
# => "Apple, Banana, Cherry"

Character escape sequences provide precise control over string content, including Unicode codepoints and byte values.

unicode_string = "Unicode: \u{1F600} \u{2764} \u{1F44D}"
# => "Unicode: 😀 ❤ 👍"

byte_string = "Hex bytes: \xFF\x00\x42"
control_chars = "Bell: \a Tab: \t Newline: \n"

Common Pitfalls

String interpolation creates new String objects on each execution, even when interpolated expressions return identical values. This behavior impacts performance in tight loops and memory-sensitive applications.

# Memory inefficient - creates new string each iteration
1000.times do |i|
  log_message = "Processing item #{i}"  # New string object each time
  # process(log_message)
end

# More efficient approaches
base_message = "Processing item "
1000.times do |i|
  log_message = base_message + i.to_s  # Still creates objects but more controlled
  # Or use String#% for formatting
  log_message = "Processing item %d" % i
end

Escape sequence interpretation differs significantly between single and double quotes, leading to unexpected behavior when switching between literal forms.

# Single quotes preserve backslashes literally
single_path = 'C:\new\file.txt'
# => "C:\\new\\file.txt" (literal backslashes)

# Double quotes interpret escape sequences
double_path = "C:\new\file.txt"
# => "C:\new\file.txt" (interpreted \n as newline)

# Correct approach for file paths
correct_path = "C:\\new\\file.txt"  # Escape backslashes in double quotes
unix_style = "C:/new/file.txt"     # Use forward slashes

Heredoc indentation behavior changes between << and <<~ forms, affecting string content in unexpected ways.

def indented_content
  if true
    standard_heredoc = <<TEXT
    This content preserves
    the leading spaces
    in the final string
TEXT
    
    squiggly_heredoc = <<~TEXT
      This content removes
      leading whitespace based
      on the least indented line
    TEXT
  end
end

# standard_heredoc contains leading spaces
# squiggly_heredoc has clean, unindented content

String interpolation evaluates expressions at string creation time, not when the string gets used. This timing affects variable access and method execution.

def create_message
  counter = 0
  message = "Counter value: #{counter += 1}"
  # counter is now 1, expression evaluated immediately
  
  lambda { message }  # Captures already-interpolated string
end

proc_message = create_message
puts proc_message.call  # "Counter value: 1"
puts proc_message.call  # "Counter value: 1" (same string, no re-evaluation)

Percent literal delimiter selection affects parsing when the chosen delimiter appears within the string content.

# Problematic - unbalanced delimiters confuse parser
# broken = %(String with ) middle parenthesis)  # Syntax error

# Solutions: choose different delimiters or escape
fixed1 = %{String with ) middle parenthesis}
fixed2 = %(String with \) middle parenthesis)
fixed3 = %Q!String with ) middle parenthesis!

Encoding issues arise when string literals contain characters incompatible with the source file encoding or when mixing strings with different encodings.

# Source file encoding affects literal interpretation
# With UTF-8 source encoding:
utf8_string = "Café résumé 🎉"  # Works correctly

# With ASCII source encoding (causes issues):
# ascii_string = "Café résumé"  # Encoding error

# Explicit encoding specification
binary_string = "Binary data".force_encoding("ASCII-8BIT")

Reference

String Literal Syntax

Syntax Interpolation Escape Sequences Use Case
'text' No \' and \\ only Literal strings, minimal processing
"text" Yes Full escape sequences General purpose, interpolated content
%q{text} No \' and \\ only Alternative to single quotes
%Q{text} Yes Full escape sequences Alternative to double quotes
%(text) Yes Full escape sequences Shorthand for %Q
<<IDENTIFIER Yes Full escape sequences Multi-line strings
<<~IDENTIFIER Yes Full escape sequences Multi-line with indent removal
<<'IDENTIFIER' No \' and \\ only Literal multi-line strings

Escape Sequences

Sequence Result Description
\" " Double quote
\' ' Single quote
\\ \ Backslash
\n Newline Line feed character
\r Carriage return Carriage return character
\t Tab Horizontal tab
\s Space Space character
\a Bell Bell/alert character
\b Backspace Backspace character
\f Form feed Form feed character
\v Vertical tab Vertical tab character
\0 Null Null character
\nnn Byte value Octal byte value (1-3 digits)
\xHH Byte value Hexadecimal byte value (1-2 digits)
\uHHHH Unicode Unicode codepoint (4 hex digits)
\u{HHHHH} Unicode Unicode codepoint (1-6 hex digits)

Percent Literal Forms

Form Equivalent Interpolation Typical Use
%q Single quotes No Literal strings with special chars
%Q Double quotes Yes Interpolated strings with special chars
% Double quotes Yes Shorthand for %Q
%w Array of strings No Word arrays without interpolation
%W Array of strings Yes Word arrays with interpolation
%r Regular expression Yes Regex patterns
%x Backtick command Yes Shell command execution
%s Symbol No Symbol creation
%i Array of symbols No Symbol arrays
%I Array of symbols Yes Symbol arrays with interpolation

Delimiter Options

Delimiter Type Examples Behavior
Paired () [] {} <> Must be balanced within content
Unpaired ! @ # $ % ^ & * - _ + = | : ; " ' ? / ~ Opening and closing delimiter identical

Heredoc Variants

Syntax Indentation Interpolation Common Usage
<<WORD Preserved Yes SQL queries, templates
<<~WORD Removed Yes Clean multi-line strings in methods
<<'WORD' Preserved No Literal multi-line content
<<~'WORD' Removed No Clean literal multi-line content

Performance Characteristics

Operation Relative Cost Notes
Single quote literal Fastest Minimal processing
Double quote without interpolation Fast Escape sequence processing
Double quote with interpolation Moderate Expression evaluation overhead
Heredoc Moderate Multi-line processing
Percent literal Fast to Moderate Depends on content and form
Frozen literal Variable Reuses objects, saves allocation

Memory Behavior

Feature Object Creation Memory Impact
String literals New object per execution High in loops
Frozen literals Reused objects Reduced allocation
Interpolation Always new object Cannot be frozen
Heredoc Single object per execution Moderate
Concatenation New object Additive memory usage