CrackedRuby logo

CrackedRuby

StringIO

Overview

StringIO creates an IO-like object that reads from and writes to strings instead of files or network connections. Ruby implements StringIO as part of the standard library, providing the same interface as File and IO classes but operating entirely in memory. The class maintains an internal string buffer and position pointer, mimicking file descriptor behavior without filesystem interaction.

StringIO inherits from IO and supports most standard IO operations including reading, writing, positioning, and mode management. The internal string grows dynamically as data is written, and the position pointer tracks the current read/write location within the buffer.

require 'stringio'

# Create empty StringIO object
io = StringIO.new
io.write("Hello World")
io.rewind
puts io.read  # => "Hello World"

Ruby creates StringIO instances in read-write mode by default, but accepts mode strings identical to File.open. The class supports text and binary modes, positioning operations, and buffer manipulation methods that mirror filesystem I/O behavior.

# Initialize with content and mode
io = StringIO.new("Initial content", "r")
io.read(7)  # => "Initial"

# Binary mode
binary_io = StringIO.new("".b)
binary_io.write([0xFF, 0xFE].pack('C*'))

StringIO serves three primary purposes: testing I/O operations without files, building strings using IO methods, and providing IO interfaces to string processing pipelines. The class maintains internal state including current position, mode flags, and the underlying string buffer accessible through the string method.

Basic Usage

StringIO construction accepts an optional string argument and mode specification. When no string is provided, Ruby creates an empty internal buffer. The mode parameter controls read/write permissions and text/binary handling using the same format as File operations.

require 'stringio'

# Empty StringIO
empty = StringIO.new
empty.write("First line\n")
empty.write("Second line\n")
empty.rewind
empty.read  # => "First line\nSecond line\n"

Writing operations append data at the current position, potentially overwriting existing content. The internal string expands automatically when writing beyond current boundaries. Position tracking behaves identically to file handles, advancing after each read or write operation.

# Writing at specific positions
io = StringIO.new("ABCDEFGH")
io.pos = 3
io.write("XYZ")
io.string  # => "ABCXYZGH"

# Position advances after operations
io.pos  # => 6
io.write("123")
io.string  # => "ABCXYZ123"

Reading methods extract data from the current position, returning partial content when insufficient data remains. The read method without arguments returns all data from current position to end, while read(n) returns exactly n bytes or nil when no data is available.

io = StringIO.new("Hello World Programming")
io.read(5)    # => "Hello"
io.read(6)    # => " World"
io.read       # => " Programming"
io.read       # => ""
io.read(1)    # => nil

Line-oriented operations support text processing workflows. The gets method reads until newline characters, while readline raises EOFError when no more lines exist. The each_line method provides enumeration over line boundaries.

text = "Line 1\nLine 2\nLine 3\n"
io = StringIO.new(text)

# Reading individual lines
io.gets      # => "Line 1\n"
io.gets      # => "Line 2\n"

# Enumerating all lines
io.rewind
io.each_line { |line| puts "Processing: #{line.chomp}" }

Position management uses familiar file operations. The rewind method resets position to zero, seek moves to absolute or relative positions, and tell returns current position. These operations enable random access patterns within the string buffer.

Error Handling & Debugging

StringIO raises IOError exceptions when operations conflict with current mode settings. Attempting to write to read-only instances or read from write-only instances triggers immediate errors. The class validates mode permissions before executing operations, providing clear error messages for debugging.

# Read-only mode restrictions
read_only = StringIO.new("Content", "r")
begin
  read_only.write("More")
rescue IOError => e
  puts e.message  # => "not opened for writing"
end

# Write-only mode restrictions  
write_only = StringIO.new("", "w")
begin
  write_only.read
rescue IOError => e
  puts e.message  # => "not opened for reading"
end

Position errors occur when seeking beyond valid boundaries or using invalid whence parameters. StringIO accepts the same seek constants as File (IO::SEEK_SET, IO::SEEK_CUR, IO::SEEK_END) and raises Errno::EINVAL for invalid combinations.

io = StringIO.new("Short")
begin
  io.seek(-10, IO::SEEK_SET)
rescue Errno::EINVAL => e
  puts "Invalid seek position"
end

# Valid seek operations
io.seek(0, IO::SEEK_END)   # Move to end
io.seek(-2, IO::SEEK_CUR)  # Move back 2 positions

EOFError exceptions occur during read operations when no data remains. The readline and readlines methods raise these errors when encountering end-of-file conditions, distinguishing between empty results and actual EOF states.

io = StringIO.new("Single line")
io.read  # Consume all content

begin
  io.readline
rescue EOFError => e
  puts "Reached end of string"
end

# Check EOF state
io.eof?  # => true

Debugging StringIO operations requires understanding internal state changes. The pos, eof?, and closed? methods provide visibility into current object state. Logging position changes and mode flags helps identify unexpected behavior in complex I/O sequences.

def debug_stringio(io, operation)
  puts "Before #{operation}: pos=#{io.pos}, eof=#{io.eof?}, closed=#{io.closed?}"
  yield
  puts "After #{operation}: pos=#{io.pos}, eof=#{io.eof?}, closed=#{io.closed?}"
  puts "Content: #{io.string.inspect}"
  puts "---"
end

io = StringIO.new("Debug content")
debug_stringio(io, "read(5)") { io.read(5) }
debug_stringio(io, "write('X')") { io.write('X') }

Encoding issues manifest when mixing binary and text operations or handling non-ASCII content. StringIO preserves string encoding throughout operations, but binary writes can corrupt text content. Always specify binary mode for non-text data.

# Encoding preservation
utf8_io = StringIO.new("UTF-8 content: ñoño")
utf8_io.string.encoding  # => #<Encoding:UTF-8>

# Binary mode for mixed content
binary_io = StringIO.new("".b)
binary_io.write("Text")
binary_io.write([0x80, 0xFF].pack('C*'))
binary_io.string.encoding  # => #<Encoding:ASCII-8BIT>

Performance & Memory

StringIO provides significant performance advantages over file I/O when working with temporary data or testing scenarios. Memory-based operations eliminate filesystem overhead, system call costs, and disk I/O latency. Benchmark comparisons show 10-100x performance improvements for small to medium datasets.

require 'benchmark'
require 'tempfile'

data = "Line of data\n" * 10000

Benchmark.bm(15) do |x|
  x.report("StringIO write") do
    1000.times do
      io = StringIO.new
      io.write(data)
      io.string
    end
  end
  
  x.report("File write") do
    1000.times do
      file = Tempfile.new
      file.write(data)
      file.rewind
      file.read
      file.close
      file.unlink
    end
  end
end

Memory consumption scales linearly with content size, but StringIO avoids file system buffer overhead. The internal string doubles capacity when space is exhausted, similar to Array growth patterns. Large datasets benefit from pre-sizing the internal buffer through initial string allocation.

# Memory-efficient initialization for known size
large_data = " " * 1_000_000  # Pre-allocate 1MB
io = StringIO.new(large_data)
io.rewind
io.truncate(0)  # Clear content but keep capacity

# Measure memory usage during operations
start_memory = GC.stat[:heap_allocated_pages]
io.write("A" * 500_000)
end_memory = GC.stat[:heap_allocated_pages]
puts "Memory pages used: #{end_memory - start_memory}"

String concatenation through multiple write operations creates memory pressure due to intermediate string allocations. Ruby's string implementation optimizes for single large writes over many small writes. Collecting data before writing improves both performance and memory efficiency.

# Inefficient: many small writes
def build_content_slow(lines)
  io = StringIO.new
  lines.each { |line| io.write("#{line}\n") }
  io.string
end

# Efficient: single large write
def build_content_fast(lines)
  content = lines.map { |line| "#{line}\n" }.join
  io = StringIO.new
  io.write(content)
  io.string
end

Positioning operations within StringIO execute in constant time since no disk seeking occurs. Random access patterns perform consistently regardless of string size, unlike file-based I/O where disk seek times vary with distance. This characteristic makes StringIO ideal for algorithms requiring frequent position changes.

# Performance comparison: sequential vs random access
content = "0123456789" * 100000
io = StringIO.new(content)

# Sequential reading (fast for both StringIO and File)
Benchmark.measure { 1000.times { io.rewind; io.read(100) } }

# Random seeking (fast only for StringIO)
Benchmark.measure do
  1000.times do
    pos = rand(content.length - 100)
    io.seek(pos)
    io.read(100)
  end
end

Testing Strategies

StringIO serves as a primary tool for testing I/O-dependent code without creating temporary files or network connections. Test isolation improves when replacing actual I/O streams with StringIO objects, eliminating filesystem dependencies and external resource coordination.

class FileProcessor
  def initialize(input_stream, output_stream)
    @input = input_stream
    @output = output_stream
  end
  
  def process_lines
    @input.each_line do |line|
      processed = line.strip.upcase
      @output.write("#{processed}\n")
    end
  end
end

# Testing without files
def test_file_processor
  input = StringIO.new("hello\nworld\n")
  output = StringIO.new
  
  processor = FileProcessor.new(input, output)
  processor.process_lines
  
  expected = "HELLO\nWORLD\n"
  assert_equal expected, output.string
end

Mock object integration uses StringIO to simulate various I/O conditions including partial reads, write failures, and positioning errors. Custom StringIO subclasses can introduce controlled failures for testing error handling paths.

class FailingStringIO < StringIO
  def initialize(string = "", fail_after: nil)
    super(string)
    @fail_after = fail_after
    @operation_count = 0
  end
  
  def read(*args)
    @operation_count += 1
    raise IOError, "Simulated failure" if @fail_after && @operation_count > @fail_after
    super
  end
end

# Test error handling
def test_read_failure_handling
  failing_io = FailingStringIO.new("content", fail_after: 1)
  
  # First read succeeds
  result1 = failing_io.read(4)
  assert_equal "cont", result1
  
  # Second read fails
  assert_raises(IOError) { failing_io.read(4) }
end

Captured output testing involves redirecting STDOUT or STDERR to StringIO objects, enabling verification of program output without polluting test output streams. This technique works particularly well for testing command-line utilities and logging functionality.

class Logger
  def initialize(output = STDOUT)
    @output = output
  end
  
  def log(level, message)
    @output.puts "[#{level.upcase}] #{Time.now}: #{message}"
  end
end

def test_logger_output
  captured_output = StringIO.new
  logger = Logger.new(captured_output)
  
  logger.log(:info, "Test message")
  logger.log(:error, "Error occurred")
  
  lines = captured_output.string.lines
  assert lines[0].include?("[INFO]")
  assert lines[0].include?("Test message")
  assert lines[1].include?("[ERROR]")
  assert lines[1].include?("Error occurred")
end

Data-driven testing scenarios benefit from StringIO's ability to simulate various input formats and sizes. Test cases can programmatically generate input data, process it through StringIO objects, and verify output without external file dependencies.

def test_csv_processing_with_various_inputs
  test_cases = [
    { input: "name,age\nAlice,30\nBob,25", expected_count: 2 },
    { input: "name,age\n", expected_count: 0 },
    { input: "invalid,data,format\ntest", expected_count: 0 },
  ]
  
  test_cases.each_with_index do |test_case, index|
    input_io = StringIO.new(test_case[:input])
    output_io = StringIO.new
    
    processor = CSVProcessor.new(input_io, output_io)
    processor.process
    
    actual_count = output_io.string.lines.count
    assert_equal test_case[:expected_count], actual_count, 
                 "Test case #{index} failed"
  end
end

Production Patterns

Web applications frequently use StringIO for generating downloadable content without creating temporary files. CSV exports, PDF generation, and data serialization benefit from in-memory string construction before streaming to HTTP responses. This pattern reduces disk I/O and simplifies cleanup operations.

class ReportController < ApplicationController
  def download_csv
    output = StringIO.new
    
    # Write CSV header
    output.write("Name,Email,Created At\n")
    
    # Stream user data
    User.find_in_batches(batch_size: 1000) do |batch|
      batch.each do |user|
        output.write("#{user.name},#{user.email},#{user.created_at}\n")
      end
    end
    
    send_data output.string,
              filename: "users_#{Date.current}.csv",
              type: 'text/csv'
  end
end

Template rendering systems utilize StringIO for building complex documents from multiple sources. The pattern allows incremental content construction while maintaining clean separation between data processing and output formatting.

class DocumentBuilder
  def initialize
    @output = StringIO.new
    @section_count = 0
  end
  
  def add_header(title, level = 1)
    @output.write("#{'#' * level} #{title}\n\n")
  end
  
  def add_section(content)
    @section_count += 1
    @output.write("## Section #{@section_count}\n\n")
    @output.write("#{content}\n\n")
  end
  
  def add_code_block(code, language = nil)
    lang = language ? language : ''
    @output.write("```#{lang}\n#{code}\n```\n\n")
  end
  
  def build
    @output.string
  end
end

# Usage in production
builder = DocumentBuilder.new
builder.add_header("API Documentation")
builder.add_section("This document describes the REST API endpoints.")
builder.add_code_block("GET /api/users", "http")
document = builder.build

Log aggregation and processing pipelines use StringIO for buffering log entries before batch processing or transmission. The pattern provides memory-efficient buffering while maintaining compatibility with existing I/O-based processing tools.

class LogBuffer
  def initialize(max_size: 64_000, flush_callback: nil)
    @buffer = StringIO.new
    @max_size = max_size
    @flush_callback = flush_callback
  end
  
  def write_entry(timestamp, level, message)
    entry = "#{timestamp} [#{level}] #{message}\n"
    @buffer.write(entry)
    
    flush if @buffer.size >= @max_size
  end
  
  def flush
    return if @buffer.size == 0
    
    content = @buffer.string.dup
    @buffer.rewind
    @buffer.truncate(0)
    
    @flush_callback&.call(content)
    content
  end
end

# Production usage
log_buffer = LogBuffer.new(
  max_size: 32_768,
  flush_callback: ->(content) { 
    LogShipper.send_to_aggregator(content) 
  }
)

Data serialization workflows leverage StringIO for building complex data structures before persistence or transmission. The approach enables incremental construction with rollback capabilities through position management.

class DataExporter
  def initialize(format: :json)
    @format = format
    @buffer = StringIO.new
  end
  
  def export_dataset(records)
    case @format
    when :json
      export_json(records)
    when :xml
      export_xml(records)
    when :csv
      export_csv(records)
    end
    
    @buffer.string
  end
  
  private
  
  def export_json(records)
    @buffer.write("[\n")
    records.each_with_index do |record, index|
      @buffer.write("  #{record.to_json}")
      @buffer.write(",\n") unless index == records.length - 1
    end
    @buffer.write("\n]")
  end
  
  def export_csv(records)
    return if records.empty?
    
    headers = records.first.keys
    @buffer.write("#{headers.join(',')}\n")
    
    records.each do |record|
      values = headers.map { |h| record[h] }
      @buffer.write("#{values.join(',')}\n")
    end
  end
end

Reference

Core Methods

Method Parameters Returns Description
StringIO.new(string="", mode="r+") string (String), mode (String) StringIO Creates new StringIO instance with optional content and mode
#<<(obj) obj (Object) StringIO Appends object string representation and returns self
#close None nil Closes StringIO for both reading and writing
#close_read None nil Closes StringIO for reading only
#close_write None nil Closes StringIO for writing only
#closed? None Boolean Returns true if StringIO is closed for both reading and writing
#each_line(sep=$/,limit=nil) {block} sep (String), limit (Integer), block Enumerator or StringIO Iterates over lines with optional separator and limit
#eof? None Boolean Returns true if positioned at end of string
#getc None String or nil Reads next character, returns nil at EOF
#gets(sep=$/,limit=nil) sep (String), limit (Integer) String or nil Reads next line with optional separator and limit
#length None Integer Returns current string length (alias for size)
#pos None Integer Returns current position within string
#pos=(position) position (Integer) Integer Sets current position within string
#read(length=nil, outbuf=nil) length (Integer), outbuf (String) String or nil Reads specified length or all remaining data
#readlines(sep=$/,limit=nil) sep (String), limit (Integer) Array<String> Returns array of all remaining lines
#rewind None Integer Sets position to 0 and returns 0
#seek(offset, whence=IO::SEEK_SET) offset (Integer), whence (Integer) Integer Sets position relative to whence constant
#size None Integer Returns current string length
#string None String Returns underlying string object
#string=(new_string) new_string (String) String Replaces underlying string and resets position
#tell None Integer Returns current position (alias for pos)
#truncate(length) length (Integer) Integer Truncates string to specified length
#write(obj, *objs) obj (Object), *objs (Array) Integer Writes objects to string, returns bytes written

Mode Specifications

Mode Read Write Position Truncate Create
"r" Yes No Start No No
"r+" Yes Yes Start No No
"w" No Yes Start Yes Yes
"w+" Yes Yes Start Yes Yes
"a" No Yes End No Yes
"a+" Yes Yes End No Yes

Seek Constants

Constant Value Description
IO::SEEK_SET 0 Absolute position from start
IO::SEEK_CUR 1 Relative position from current
IO::SEEK_END 2 Relative position from end

Common Exceptions

Exception Condition Resolution
IOError Writing to read-only or reading from write-only Check mode permissions
Errno::EINVAL Invalid seek parameters Use valid whence constants and positions
EOFError Reading past end with readline methods Check eof? before reading
TypeError Invalid mode string Use documented mode specifications

State Query Methods

Method Returns Description
#closed? Boolean True if closed for both read and write
#eof? Boolean True if position is at end of string
#pos Integer Current byte position within string
#size Integer Total string length in bytes

Binary Mode Operations

Binary mode handles raw byte data without encoding conversion. Use "b" suffix with mode strings for binary operations.

# Binary mode example
binary_io = StringIO.new("".b, "w+b")
binary_io.write([0x48, 0x65, 0x6C, 0x6C, 0x6F].pack('C*'))
binary_io.string.encoding  # => #<Encoding:ASCII-8BIT>