CrackedRuby logo

CrackedRuby

Tempfile

Ruby's Tempfile class creates and manages temporary files with automatic cleanup capabilities.

Standard Library File Utilities
4.6.3

Overview

Tempfile extends File to create temporary files that get automatically deleted when the program exits or when explicitly closed and unlinked. Ruby's implementation handles the complex task of generating unique filenames, placing files in appropriate system directories, and managing cleanup operations.

The class creates files in the system's temporary directory (/tmp on Unix-like systems, determined by Dir.tmpdir) with automatically generated unique names. Each Tempfile instance maintains a reference to both the file handle and the filesystem path, allowing standard File operations while tracking cleanup responsibilities.

require 'tempfile'

# Create a temporary file
temp = Tempfile.new('myapp')
temp.write('temporary data')
temp.rewind
puts temp.read  # => "temporary data"
temp.close

Tempfile inherits from File, providing access to all standard file operations including reading, writing, seeking, and truncating. The key distinction lies in lifecycle management - temporary files register themselves for cleanup through Ruby's at_exit handler and provide explicit cleanup methods.

# File operations work normally
temp = Tempfile.new(['prefix_', '.txt'])
temp.puts "Line 1"
temp.puts "Line 2" 
temp.rewind
temp.each_line { |line| puts "Read: #{line}" }
temp.close

The constructor accepts either a simple basename string or an array containing a prefix and suffix. Ruby generates the unique portion of the filename automatically, ensuring no conflicts with existing files. The temporary directory location follows system conventions but can be overridden through the constructor's directory parameter.

Basic Usage

Creating temporary files requires calling Tempfile.new with a basename parameter. The basename can be a string for simple naming or an array specifying both prefix and suffix components. Ruby handles the unique identifier generation automatically.

require 'tempfile'

# Simple basename
basic_temp = Tempfile.new('logfile')
puts basic_temp.path  # => "/tmp/logfile20241130-12345-abcdef"

# Prefix and suffix
named_temp = Tempfile.new(['data_', '.json'])
puts named_temp.path  # => "/tmp/data_20241130-12345-ghijkl.json"

# Custom directory
custom_temp = Tempfile.new('cache', '/var/tmp')
puts custom_temp.path  # => "/var/tmp/cache20241130-12345-mnopqr"

Writing data to temporary files follows standard File patterns. The file remains open for operations until explicitly closed or the program terminates. Ruby buffers write operations according to standard I/O buffering rules.

temp = Tempfile.new('output')
temp.write('Initial content')
temp.puts 'Additional line'
temp.print 'More ', 'content'

# Force buffer flush
temp.flush

# Read back the data
temp.rewind
content = temp.read
puts content
# => Initial content
# => Additional line  
# => More content

Reading operations require positioning the file pointer appropriately. The rewind method returns to the beginning, while seek provides precise positioning. Ruby maintains the file position across read operations like any standard file handle.

temp = Tempfile.new('data')
temp.puts 'First line'
temp.puts 'Second line' 
temp.puts 'Third line'

# Read from beginning
temp.rewind
first_line = temp.gets.chomp  # => "First line"

# Read remaining content
remaining = temp.read  # => "Second line\nThird line\n"

# Position-based reading
temp.rewind
temp.seek(11, IO::SEEK_SET)  # Skip "First line\n"
second_line = temp.gets.chomp  # => "Second line"

Cleanup operations provide both automatic and manual control over file deletion. The close method closes the file handle but keeps the file on disk, while close! (or unlink after closing) removes the file immediately. Ruby registers all temporary files for automatic deletion when the program exits.

temp = Tempfile.new('temp_data')
temp.write('some data')
path = temp.path

# File exists and is accessible
puts File.exist?(path)  # => true

# Close but keep file
temp.close
puts File.exist?(path)  # => true

# Delete the file
temp.unlink
puts File.exist?(path)  # => false

Error Handling & Debugging

Tempfile operations raise exceptions for various filesystem and permission issues. The most common exceptions include Errno::EACCES for permission problems, Errno::ENOSPC for insufficient disk space, and Errno::EROFS for read-only filesystems.

Directory permission issues occur when the temporary directory lacks write access or when specifying custom directories without appropriate permissions. Ruby raises Errno::EACCES in these scenarios, requiring fallback strategies or permission corrections.

def create_temp_file_safely(basename, preferred_dir = nil)
  dirs_to_try = [preferred_dir, Dir.tmpdir, '/tmp', '.'].compact
  
  dirs_to_try.each do |dir|
    begin
      return Tempfile.new(basename, dir)
    rescue Errno::EACCES => e
      puts "Cannot write to #{dir}: #{e.message}"
      next
    rescue Errno::ENOTDIR => e
      puts "Not a directory #{dir}: #{e.message}"
      next
    end
  end
  
  raise "No writable directory found for temporary file"
end

# Usage with fallback
begin
  temp = create_temp_file_safely('myapp', '/restricted/tmp')
rescue => e
  puts "Failed to create temporary file: #{e.message}"
end

Disk space exhaustion during write operations raises Errno::ENOSPC, requiring error handling that accounts for partial writes and cleanup of unusable files. The temporary file may exist but contain incomplete data.

def write_with_space_check(tempfile, data)
  begin
    tempfile.write(data)
    tempfile.flush  # Ensure data reaches disk
  rescue Errno::ENOSPC => e
    # Clean up partial file
    tempfile.close!
    raise "Insufficient disk space: #{e.message}"
  rescue => e
    tempfile.close! if tempfile && !tempfile.closed?
    raise
  end
end

temp = Tempfile.new('large_data')
begin
  write_with_space_check(temp, "x" * 1_000_000)
rescue => e
  puts "Write failed: #{e.message}"
end

File descriptor exhaustion becomes problematic when creating many temporary files without proper cleanup. Ruby has system-imposed limits on open file descriptors, typically around 1,024 for user processes. Exceeding these limits raises Errno::EMFILE.

def process_multiple_files_safely(count)
  files = []
  
  begin
    count.times do |i|
      temp = Tempfile.new("batch_#{i}")
      temp.write("data for file #{i}")
      files << temp
      
      # Periodically close files to manage descriptors
      if files.length > 100
        files.shift.close!
      end
    end
  rescue Errno::EMFILE => e
    puts "Too many open files: #{e.message}"
    # Clean up all opened files
    files.each(&:close!)
    raise
  ensure
    # Final cleanup
    files.each { |f| f.close! unless f.closed? }
  end
end

Debugging temporary file issues requires tracking file paths, permissions, and cleanup status. The path method returns the filesystem location, while closed? indicates file handle status. Combining these with filesystem checks provides comprehensive debugging information.

def debug_tempfile_state(tempfile)
  puts "Path: #{tempfile.path}"
  puts "Closed?: #{tempfile.closed?}"
  puts "File exists?: #{File.exist?(tempfile.path)}"
  puts "File size: #{File.size(tempfile.path)} bytes" if File.exist?(tempfile.path)
  
  if File.exist?(tempfile.path)
    stat = File.stat(tempfile.path)
    puts "Permissions: #{sprintf('%o', stat.mode & 0777)}"
    puts "Owner: #{stat.uid}"
    puts "Modified: #{stat.mtime}"
  end
rescue => e
  puts "Debug error: #{e.message}"
end

Thread Safety & Concurrency

Tempfile creation itself is thread-safe because Ruby generates unique filenames using process ID, thread ID, and atomic counters. Multiple threads can create temporary files simultaneously without filename conflicts or race conditions in the naming mechanism.

require 'thread'

threads = 10.times.map do |i|
  Thread.new do
    temp = Tempfile.new("thread_#{i}")
    temp.write("Data from thread #{i}")
    puts "Thread #{i}: #{temp.path}"
    temp.close!
  end
end

threads.each(&:join)

However, sharing Tempfile instances across threads requires synchronization for write operations. Multiple threads writing to the same file handle can interleave data unpredictably, corrupting the file contents. Ruby's file I/O operations are not atomic at the application level.

require 'thread'

temp = Tempfile.new('shared')
mutex = Mutex.new

threads = 5.times.map do |i|
  Thread.new do
    10.times do |j|
      mutex.synchronize do
        temp.puts "Thread #{i}, iteration #{j}"
        temp.flush  # Ensure immediate write
      end
    end
  end
end

threads.each(&:join)

temp.rewind
puts temp.read
temp.close!

Cleanup operations in concurrent environments require careful coordination. If one thread closes and unlinks a Tempfile while another thread attempts to use it, the second thread encounters Errno::ENOENT errors. Proper synchronization prevents these race conditions.

class ThreadSafeTempfile
  def initialize(basename)
    @tempfile = Tempfile.new(basename)
    @mutex = Mutex.new
    @closed = false
  end

  def write(data)
    @mutex.synchronize do
      raise "File already closed" if @closed
      @tempfile.write(data)
    end
  end

  def read
    @mutex.synchronize do
      raise "File already closed" if @closed
      @tempfile.rewind
      @tempfile.read
    end
  end

  def close!
    @mutex.synchronize do
      return if @closed
      @tempfile.close!
      @closed = true
    end
  end

  def path
    @tempfile.path
  end
end

# Usage across threads
safe_temp = ThreadSafeTempfile.new('concurrent')

writer = Thread.new do
  100.times { |i| safe_temp.write("Line #{i}\n") }
  safe_temp.close!
end

reader = Thread.new do
  sleep 0.1  # Let some writing happen
  begin
    content = safe_temp.read
    puts "Read #{content.lines.count} lines"
  rescue => e
    puts "Read error: #{e.message}"
  end
end

[writer, reader].each(&:join)

Background cleanup threads can monitor and remove abandoned temporary files, but require careful lifecycle management to avoid removing files still in use. Implementing reference counting or explicit registration prevents premature deletion.

class TempfileManager
  def initialize
    @files = {}
    @mutex = Mutex.new
    start_cleanup_thread
  end

  def create_temp(basename)
    temp = Tempfile.new(basename)
    @mutex.synchronize do
      @files[temp.path] = { file: temp, created: Time.now }
    end
    temp
  end

  def remove_temp(tempfile)
    @mutex.synchronize do
      @files.delete(tempfile.path)
    end
    tempfile.close!
  end

  private

  def start_cleanup_thread
    @cleanup_thread = Thread.new do
      loop do
        sleep 60  # Check every minute
        cleanup_old_files
      end
    end
  end

  def cleanup_old_files
    cutoff = Time.now - 3600  # 1 hour old
    
    @mutex.synchronize do
      @files.select { |_, info| info[:created] < cutoff }.each do |path, info|
        begin
          info[:file].close!
          @files.delete(path)
          puts "Cleaned up old temp file: #{path}"
        rescue => e
          puts "Cleanup error for #{path}: #{e.message}"
        end
      end
    end
  end
end

Production Patterns

Web applications commonly use temporary files for upload processing, report generation, and data transformation tasks. Proper lifecycle management becomes critical in production environments where memory leaks and disk space exhaustion can impact service availability.

class FileUploadProcessor
  def initialize(max_size: 100.megabytes, cleanup_age: 1.hour)
    @max_size = max_size
    @cleanup_age = cleanup_age
    @active_files = {}
    setup_cleanup_monitoring
  end

  def process_upload(uploaded_file)
    validate_file_size(uploaded_file)
    
    temp = Tempfile.new(['upload_', '.tmp'], Rails.root.join('tmp'))
    @active_files[temp.path] = Time.current
    
    begin
      # Process uploaded content
      temp.binmode
      uploaded_file.rewind
      IO.copy_stream(uploaded_file, temp)
      temp.flush
      
      # Perform processing operations
      result = transform_file_content(temp)
      
      # Store result and cleanup
      store_processed_result(result)
      
    ensure
      cleanup_temp_file(temp)
    end
  end

  private

  def validate_file_size(file)
    if file.size > @max_size
      raise "File too large: #{file.size} bytes exceeds #{@max_size} bytes"
    end
  end

  def transform_file_content(tempfile)
    tempfile.rewind
    # Perform transformations
    processed_data = tempfile.read.upcase  # Example transformation
    processed_data
  end

  def cleanup_temp_file(tempfile)
    @active_files.delete(tempfile.path)
    tempfile.close! unless tempfile.closed?
  rescue => e
    Rails.logger.error "Tempfile cleanup failed: #{e.message}"
  end

  def setup_cleanup_monitoring
    Thread.new do
      loop do
        sleep 300  # Check every 5 minutes
        cleanup_stale_files
      end
    end
  end

  def cleanup_stale_files
    cutoff = Time.current - @cleanup_age
    
    @active_files.select { |_, created_at| created_at < cutoff }.each do |path, _|
      begin
        File.unlink(path) if File.exist?(path)
        @active_files.delete(path)
        Rails.logger.info "Cleaned up stale temp file: #{path}"
      rescue => e
        Rails.logger.error "Failed to cleanup #{path}: #{e.message}"
      end
    end
  end
end

Report generation systems require careful resource management when creating large temporary files. Implementing streaming writes and memory-conscious processing prevents excessive memory usage while maintaining good performance.

class ReportGenerator
  def generate_csv_report(query_params)
    temp = Tempfile.new(['report_', '.csv'])
    
    begin
      # Write CSV headers
      temp.puts generate_headers(query_params).to_csv
      
      # Stream data in batches to control memory usage
      batch_size = 1000
      offset = 0
      
      loop do
        records = fetch_records(query_params, limit: batch_size, offset: offset)
        break if records.empty?
        
        records.each do |record|
          temp.puts format_record_as_csv(record)
        end
        
        temp.flush  # Ensure data reaches disk
        offset += batch_size
        
        # Memory management
        GC.start if offset % 10000 == 0
      end
      
      # Finalize file
      temp.rewind
      file_size = temp.size
      
      # Return file info for download
      {
        path: temp.path,
        size: file_size,
        filename: "report_#{Time.current.strftime('%Y%m%d_%H%M%S')}.csv"
      }
      
    rescue => e
      temp.close! if temp && !temp.closed?
      raise "Report generation failed: #{e.message}"
    end
  end

  def cleanup_report_file(file_path)
    File.unlink(file_path) if File.exist?(file_path)
  rescue => e
    Rails.logger.error "Failed to cleanup report file #{file_path}: #{e.message}"
  end

  private

  def generate_headers(params)
    ['ID', 'Name', 'Created At', 'Status']  # Example headers
  end

  def fetch_records(params, limit:, offset:)
    # Database query with pagination
    # This is a placeholder - implement actual query logic
    []
  end

  def format_record_as_csv(record)
    [record.id, record.name, record.created_at, record.status].to_csv.chomp
  end
end

Monitoring temporary file usage helps prevent disk space issues and identifies resource leaks. Implementing metrics collection and alerting provides operational visibility into temporary file patterns.

class TempfileMonitor
  def self.collect_metrics
    temp_dir = Dir.tmpdir
    pattern = File.join(temp_dir, '*')
    
    files = Dir.glob(pattern)
    ruby_tempfiles = files.select { |f| File.basename(f).match?(/\A\w+\d{8}-\d+-\w+/) }
    
    total_size = ruby_tempfiles.sum { |f| File.size(f) rescue 0 }
    file_count = ruby_tempfiles.count
    
    oldest_file_age = if ruby_tempfiles.any?
      Time.current - ruby_tempfiles.map { |f| File.mtime(f) rescue Time.current }.min
    else
      0
    end

    {
      temp_file_count: file_count,
      temp_files_size_bytes: total_size,
      oldest_temp_file_age_seconds: oldest_file_age.to_i,
      temp_directory: temp_dir
    }
  end

  def self.alert_on_excessive_usage(max_files: 1000, max_size_mb: 500)
    metrics = collect_metrics
    
    if metrics[:temp_file_count] > max_files
      alert("Too many temporary files: #{metrics[:temp_file_count]} > #{max_files}")
    end
    
    size_mb = metrics[:temp_files_size_bytes] / 1_048_576
    if size_mb > max_size_mb
      alert("Temporary files using too much space: #{size_mb}MB > #{max_size_mb}MB")
    end
  end

  def self.alert(message)
    Rails.logger.error "[TEMPFILE ALERT] #{message}"
    # Send to monitoring system, email, etc.
  end
end

Performance & Memory

Tempfile performance depends primarily on the underlying filesystem and disk I/O characteristics. SSD storage provides better random access performance for temporary files compared to traditional spinning disks, especially for workloads involving frequent seeks and small writes.

Memory usage patterns differ significantly between text and binary modes. Text mode processing involves character encoding conversions that consume additional memory, while binary mode (binmode) provides direct byte access with minimal memory overhead.

require 'benchmark'

def benchmark_write_modes(data_size)
  data = 'x' * data_size
  
  Benchmark.bm(15) do |bm|
    bm.report('text mode') do
      temp = Tempfile.new('text_test')
      temp.write(data)
      temp.close!
    end
    
    bm.report('binary mode') do
      temp = Tempfile.new('binary_test')
      temp.binmode
      temp.write(data)
      temp.close!
    end
  end
end

# Test with 10MB of data
benchmark_write_modes(10 * 1024 * 1024)

Large file processing benefits from streaming approaches that minimize memory footprint. Reading entire temporary files into memory can cause issues with large datasets, while line-by-line or chunk-based processing maintains consistent memory usage.

class MemoryEfficientProcessor
  CHUNK_SIZE = 64 * 1024  # 64KB chunks

  def process_large_tempfile(tempfile)
    tempfile.rewind
    processed_bytes = 0
    
    while chunk = tempfile.read(CHUNK_SIZE)
      break if chunk.empty?
      
      # Process chunk without loading entire file
      process_chunk(chunk)
      processed_bytes += chunk.bytesize
      
      # Periodic memory cleanup
      if processed_bytes % (1024 * 1024) == 0  # Every MB
        GC.start
        puts "Processed #{processed_bytes / 1024 / 1024}MB"
      end
    end
    
    processed_bytes
  end

  private

  def process_chunk(chunk)
    # Example processing - count characters
    chunk.each_char.count { |c| c.match?(/[a-zA-Z]/) }
  end
end

# Usage with large file
temp = Tempfile.new('large_data')
temp.write('A' * 50_000_000)  # 50MB file
temp.flush

processor = MemoryEfficientProcessor.new
bytes_processed = processor.process_large_tempfile(temp)
puts "Total processed: #{bytes_processed} bytes"

temp.close!

Buffer management affects both performance and memory usage. Ruby's default buffering behavior works well for most cases, but explicit buffer control can optimize specific scenarios like high-frequency small writes or large sequential transfers.

def compare_buffering_strategies(write_count, data_per_write)
  data = 'x' * data_per_write
  
  # Strategy 1: Default buffering
  time1 = Benchmark.realtime do
    temp1 = Tempfile.new('default_buffer')
    write_count.times { temp1.write(data) }
    temp1.close!
  end
  
  # Strategy 2: Explicit flushing
  time2 = Benchmark.realtime do
    temp2 = Tempfile.new('flush_each')
    write_count.times do
      temp2.write(data)
      temp2.flush
    end
    temp2.close!
  end
  
  # Strategy 3: Batch writing
  time3 = Benchmark.realtime do
    temp3 = Tempfile.new('batch_write')
    batch_data = data * write_count
    temp3.write(batch_data)
    temp3.close!
  end
  
  puts "Default buffering: #{time1.round(3)}s"
  puts "Flush each write: #{time2.round(3)}s"
  puts "Batch writing: #{time3.round(3)}s"
end

# Test with many small writes
compare_buffering_strategies(10_000, 100)

File descriptor management becomes performance-critical in applications creating many temporary files. Each open Tempfile consumes a file descriptor from the system's limited pool. Proper cleanup prevents descriptor exhaustion and associated performance degradation.

class FileDescriptorTracker
  def self.current_fd_count
    if RUBY_PLATFORM =~ /linux/
      Dir['/proc/self/fd/*'].count
    elsif RUBY_PLATFORM =~ /darwin/
      `lsof -p #{Process.pid} | wc -l`.to_i
    else
      -1  # Unknown platform
    end
  rescue
    -1
  end

  def self.monitor_fd_usage
    initial_count = current_fd_count
    
    yield
    
    final_count = current_fd_count
    if final_count > 0 && initial_count > 0
      puts "File descriptor change: #{final_count - initial_count}"
    end
  end
end

# Monitor FD usage during temp file operations
FileDescriptorTracker.monitor_fd_usage do
  temps = 100.times.map { Tempfile.new('fd_test') }
  temps.each(&:close!)
end

Reference

Core Methods

Method Parameters Returns Description
Tempfile.new(basename, tmpdir=Dir.tmpdir, mode: 0, **options) basename (String/Array), tmpdir (String), mode (Integer), options (Hash) Tempfile Creates new temporary file with unique name
#close(unlink_now=false) unlink_now (Boolean) nil Closes file handle, optionally deletes file
#close! None nil Closes and immediately deletes temporary file
#unlink None Tempfile Removes file from filesystem, keeps handle open
#path None String Returns full filesystem path to temporary file
#size None Integer Returns current file size in bytes

Lifecycle Management

Method Parameters Returns Description
#rewind None 0 Sets file position to beginning
#flush None Tempfile Forces buffered data to disk
#fsync None 0 Synchronizes file data and metadata to disk
#closed? None Boolean Returns true if file handle is closed
#binmode None Tempfile Sets binary mode for file operations

Class Methods

Method Parameters Returns Description
Tempfile.create(basename, tmpdir=Dir.tmpdir, **options) {block} basename (String/Array), tmpdir (String), block Block result Creates tempfile, yields to block, ensures cleanup
Tempfile.open(*args) {block} Same as new Block result Alias for create method

Constructor Options

Option Type Default Description
:mode Integer 0 File permission mode (combined with umask)
:suffix String '' File extension (alternative to array basename)
:prefix String '' Filename prefix (alternative to array basename)
:tmpdir String Dir.tmpdir Directory for temporary file creation

Basename Format Options

Format Example Generated Filename
String 'myapp' myapp20241130-1234-5678ab
Array with suffix ['data_', '.json'] data_20241130-1234-5678ab.json
Array without suffix ['prefix_', ''] prefix_20241130-1234-5678ab

Common Exceptions

Exception Trigger Condition Typical Cause
Errno::EACCES Permission denied Insufficient write permissions to temp directory
Errno::ENOSPC No space left Disk full during file creation or write
Errno::EMFILE Too many open files Exceeded process file descriptor limit
Errno::ENOENT File not found Attempting to access unlinked temporary file
Errno::EROFS Read-only filesystem Temp directory on read-only mount

Cleanup Behavior

Scenario Automatic Cleanup Manual Cleanup Required
Program exit Yes (via at_exit) No
Exception during processing No Yes (use ensure blocks)
Long-running processes No Yes (call close! or unlink)
Thread termination No Yes
Garbage collection No Yes

Performance Characteristics

Operation Typical Performance Memory Usage
File creation O(1) + filesystem overhead Minimal (file handle only)
Sequential write O(n) with data size Buffered (8KB default)
Random access O(1) + seek time Position-dependent
Cleanup O(1) + filesystem overhead None after cleanup

Thread Safety Matrix

Operation Thread Safe Notes
Creating new Tempfile Yes Unique name generation is atomic
Writing to same instance No Requires external synchronization
Reading from same instance No File position is shared
Closing/unlinking No Race conditions possible
Path access Yes Path string is immutable