CrackedRuby

Overview

Ruby provides GZip compression and decompression through the Zlib library, specifically the Zlib::GzipReader and Zlib::GzipWriter classes. These classes implement the RFC 1952 GZip file format, allowing Ruby applications to create and read compressed data compatible with standard GZip tools.

The GZip implementation operates on IO objects, supporting both file-based and in-memory compression operations. Ruby's GZip classes handle the complete format specification including headers, compression metadata, and checksums automatically.

require 'zlib'

# Basic file compression
Zlib::GzipFile.wrap(File.open('data.txt.gz', 'wb')) do |gz|
  gz.write("Hello, compressed world!")
end

# Basic file decompression  
Zlib::GzipFile.wrap(File.open('data.txt.gz', 'rb')) do |gz|
  puts gz.read
end
# => "Hello, compressed world!"

The Zlib::GzipWriter class compresses data as it writes, while Zlib::GzipReader decompresses data during reading. Both classes maintain compatibility with external GZip files and provide access to compression metadata including modification times, original filenames, and comments.

Ruby's GZip implementation supports all standard compression levels from 0 (no compression) through 9 (maximum compression), with level 6 as the default. The classes also provide streaming capabilities for processing large datasets without loading entire files into memory.

Basic Usage

File compression with Zlib::GzipWriter requires opening a binary write stream and writing data through the GZip wrapper. The writer automatically handles format headers, compression algorithms, and file checksums.

require 'zlib'

# Compress text file
File.open('large_data.txt.gz', 'wb') do |file|
  Zlib::GzipWriter.wrap(file) do |gz|
    gz.write("Line 1: Important data\n")
    gz.write("Line 2: More information\n") 
    gz.write("Line 3: Final content\n")
  end
end

Reading compressed files uses Zlib::GzipReader with similar IO patterns. The reader transparently decompresses data and provides standard IO methods including read, gets, and each_line.

# Decompress and process file
File.open('large_data.txt.gz', 'rb') do |file|
  Zlib::GzipReader.wrap(file) do |gz|
    gz.each_line do |line|
      puts "Processed: #{line.chomp}"
    end
  end
end
# => Processed: Line 1: Important data
# => Processed: Line 2: More information  
# => Processed: Line 3: Final content

String compression operates through StringIO objects, allowing in-memory compression without temporary files. This approach suits small to medium datasets and network operations.

require 'stringio'

# Compress string to bytes
def compress_string(data)
  io = StringIO.new
  Zlib::GzipWriter.wrap(io) do |gz|
    gz.write(data)
  end
  io.string
end

# Decompress bytes to string
def decompress_string(compressed_data)
  io = StringIO.new(compressed_data)
  Zlib::GzipReader.wrap(io) do |gz|
    gz.read
  end
end

original = "This text will be compressed using GZip"
compressed = compress_string(original)
decompressed = decompress_string(compressed)

puts "Original size: #{original.bytesize} bytes"
puts "Compressed size: #{compressed.bytesize} bytes"  
puts "Decompressed: #{decompressed}"
# => Original size: 38 bytes
# => Compressed size: 56 bytes
# => Decompressed: This text will be compressed using GZip

The GZip format includes metadata storage for original filenames, modification times, and comments. Ruby provides access to this information through reader properties and allows setting metadata during compression.

# Set metadata during compression
File.open('archive.txt.gz', 'wb') do |file|
  Zlib::GzipWriter.wrap(file) do |gz|
    gz.orig_name = "original_file.txt"
    gz.comment = "Archived on #{Time.now}"
    gz.mtime = Time.now
    gz.write("File contents with metadata")
  end
end

# Read metadata from compressed file
File.open('archive.txt.gz', 'rb') do |file|
  Zlib::GzipReader.wrap(file) do |gz|
    puts "Original name: #{gz.orig_name}"
    puts "Comment: #{gz.comment}"
    puts "Modified: #{gz.mtime}"
    puts "Content: #{gz.read}"
  end
end

Error Handling & Debugging

GZip operations raise specific exceptions for different failure conditions. The Zlib::Error hierarchy provides detailed error information for debugging compression issues.

require 'zlib'

def safe_decompress(filename)
  File.open(filename, 'rb') do |file|
    Zlib::GzipReader.wrap(file) do |gz|
      return gz.read
    end
  end
rescue Zlib::GzipFile::Error => e
  puts "GZip format error: #{e.message}"
  nil
rescue Zlib::DataError => e
  puts "Corrupted data: #{e.message}"
  nil
rescue Zlib::BufError => e
  puts "Buffer error: #{e.message}" 
  nil
rescue Errno::ENOENT
  puts "File not found: #{filename}"
  nil
end

# Test with corrupted file
File.write('broken.gz', 'not gzip data')
result = safe_decompress('broken.gz')
# => GZip format error: not in gzip format

Validating GZip files before processing prevents application crashes and provides user feedback for invalid data. Ruby's GZip implementation performs header validation automatically but requires explicit error handling.

def validate_gzip_file(filename)
  File.open(filename, 'rb') do |file|
    # Check magic number (first 2 bytes)
    magic = file.read(2)
    return false if magic != "\x1f\x8b"
    
    file.rewind
    Zlib::GzipReader.wrap(file) do |gz|
      # Attempt to read first byte to validate format
      gz.readchar
      return true
    end
  end
rescue Zlib::Error, EOFError
  false
rescue Errno::ENOENT
  false
end

# Create test files
File.write('valid.gz', Zlib::Deflate.deflate('test'))
File.write('invalid.gz', 'random data')

puts validate_gzip_file('valid.gz')   # => false (deflate != gzip)
puts validate_gzip_file('invalid.gz') # => false

Memory exhaustion during decompression of maliciously crafted files requires defensive programming. Implementing size limits and streaming approaches prevents denial-of-service attacks.

def safe_decompress_with_limit(filename, max_size: 10 * 1024 * 1024)
  decompressed_size = 0
  result = String.new
  
  File.open(filename, 'rb') do |file|
    Zlib::GzipReader.wrap(file) do |gz|
      while chunk = gz.read(8192) # Read in chunks
        decompressed_size += chunk.bytesize
        
        if decompressed_size > max_size
          raise "Decompressed size exceeds limit: #{max_size} bytes"
        end
        
        result << chunk
      end
    end
  end
  
  result
rescue Zlib::Error => e
  raise "Decompression failed: #{e.message}"
end

Performance & Memory

Compression levels significantly impact both processing time and output size. Higher compression levels require more CPU cycles but produce smaller files, creating trade-offs for different use cases.

require 'benchmark'

data = "x" * 100_000 # 100KB of repeated character

Benchmark.bm(15) do |x|
  (0..9).each do |level|
    x.report("Level #{level}:") do
      io = StringIO.new
      Zlib::GzipWriter.wrap(io, level) do |gz|
        gz.write(data)
      end
      
      compressed_size = io.string.bytesize
      ratio = (compressed_size.to_f / data.bytesize * 100).round(1)
      print " #{compressed_size} bytes (#{ratio}%)"
    end
  end
end

Streaming large files prevents memory exhaustion by processing data in chunks. Ruby's GZip classes support streaming operations that maintain constant memory usage regardless of file size.

def stream_compress_file(input_path, output_path, chunk_size: 64 * 1024)
  File.open(output_path, 'wb') do |output_file|
    Zlib::GzipWriter.wrap(output_file) do |gz|
      File.open(input_path, 'rb') do |input_file|
        while chunk = input_file.read(chunk_size)
          gz.write(chunk)
        end
      end
    end
  end
end

def stream_decompress_file(input_path, output_path, chunk_size: 64 * 1024)
  File.open(input_path, 'rb') do |input_file|
    Zlib::GzipReader.wrap(input_file) do |gz|
      File.open(output_path, 'wb') do |output_file|
        while chunk = gz.read(chunk_size)
          output_file.write(chunk)
        end
      end
    end
  end
end

Buffer management affects compression performance and memory usage. Ruby's GZip implementation uses internal buffers that can be tuned for specific workloads through chunk size selection.

class OptimizedGzipProcessor
  def initialize(buffer_size: 32 * 1024)
    @buffer_size = buffer_size
  end
  
  def compress_data(data)
    io = StringIO.new
    Zlib::GzipWriter.wrap(io) do |gz|
      data.each_slice(@buffer_size) do |chunk|
        gz.write(chunk.join)
      end
    end
    io.string
  end
  
  def process_large_dataset(file_pattern)
    Dir.glob(file_pattern).each do |filename|
      compressed_name = "#{filename}.gz"
      
      start_time = Time.now
      stream_compress_file(filename, compressed_name, chunk_size: @buffer_size)
      processing_time = Time.now - start_time
      
      original_size = File.size(filename)
      compressed_size = File.size(compressed_name)
      ratio = (compressed_size.to_f / original_size * 100).round(1)
      
      puts "#{filename}: #{original_size} → #{compressed_size} (#{ratio}%) in #{processing_time.round(2)}s"
    end
  end
end

Production Patterns

Web applications commonly use GZip compression for HTTP response compression. Ruby web frameworks and Rack middleware provide integration points for transparent response compression.

# Rack middleware example
class GzipResponseMiddleware
  def initialize(app, options = {})
    @app = app
    @min_size = options[:min_size] || 1024
    @compress_types = options[:types] || %w[text/html text/css text/javascript application/json]
  end
  
  def call(env)
    status, headers, body = @app.call(env)
    headers = Rack::Utils::HeaderHash.new(headers)
    
    return [status, headers, body] unless should_compress?(env, headers, body)
    
    compressed_body = compress_response(body)
    headers['Content-Encoding'] = 'gzip'
    headers['Content-Length'] = compressed_body.bytesize.to_s
    headers.delete('ETag') # Remove ETag as content changed
    
    [status, headers, [compressed_body]]
  end
  
  private
  
  def should_compress?(env, headers, body)
    return false unless env['HTTP_ACCEPT_ENCODING']&.include?('gzip')
    return false unless @compress_types.any? { |type| headers['Content-Type']&.start_with?(type) }
    
    body_size = body.respond_to?(:bytesize) ? body.bytesize : body.sum(&:bytesize)
    body_size >= @min_size
  end
  
  def compress_response(body)
    io = StringIO.new
    Zlib::GzipWriter.wrap(io) do |gz|
      body.each { |chunk| gz.write(chunk) }
    end
    io.string
  end
end

Log file compression for long-term storage requires handling active log rotation and maintaining searchability. Background processing with proper error handling prevents impact on application performance.

class LogCompressor
  def initialize(log_directory, retention_days: 30)
    @log_directory = log_directory
    @retention_days = retention_days
  end
  
  def compress_old_logs
    old_log_files.each do |log_file|
      compress_log_file(log_file)
      File.delete(log_file) if File.exist?("#{log_file}.gz")
    end
    
    cleanup_expired_logs
  end
  
  private
  
  def old_log_files
    pattern = File.join(@log_directory, "*.log")
    Dir.glob(pattern).select do |file|
      File.mtime(file) < Time.now - (24 * 60 * 60) # Older than 1 day
    end
  end
  
  def compress_log_file(log_file)
    compressed_file = "#{log_file}.gz"
    return if File.exist?(compressed_file)
    
    File.open(compressed_file, 'wb') do |output|
      Zlib::GzipWriter.wrap(output) do |gz|
        gz.orig_name = File.basename(log_file)
        gz.mtime = File.mtime(log_file)
        
        File.open(log_file, 'rb') do |input|
          IO.copy_stream(input, gz)
        end
      end
    end
  rescue => e
    File.delete(compressed_file) if File.exist?(compressed_file)
    raise "Failed to compress #{log_file}: #{e.message}"
  end
  
  def cleanup_expired_logs
    cutoff_time = Time.now - (@retention_days * 24 * 60 * 60)
    
    Dir.glob(File.join(@log_directory, "*.gz")).each do |gz_file|
      File.delete(gz_file) if File.mtime(gz_file) < cutoff_time
    end
  end
end

Database backup compression reduces storage costs and transfer times. Streaming compression during backup creation eliminates temporary file requirements and improves performance.

class DatabaseBackupCompressor
  def initialize(database_url, backup_location)
    @database_url = database_url
    @backup_location = backup_location
  end
  
  def create_compressed_backup
    timestamp = Time.now.strftime("%Y%m%d_%H%M%S")
    backup_file = File.join(@backup_location, "backup_#{timestamp}.sql.gz")
    
    File.open(backup_file, 'wb') do |file|
      Zlib::GzipWriter.wrap(file) do |gz|
        gz.orig_name = "backup_#{timestamp}.sql"
        gz.mtime = Time.now
        
        # Stream pg_dump output directly to GZip
        IO.popen(["pg_dump", @database_url], "rb") do |dump|
          IO.copy_stream(dump, gz)
        end
      end
    end
    
    verify_backup(backup_file)
    backup_file
  end
  
  def restore_from_backup(backup_file)
    File.open(backup_file, 'rb') do |file|
      Zlib::GzipReader.wrap(file) do |gz|
        IO.popen(["psql", @database_url], "wb") do |psql|
          IO.copy_stream(gz, psql)
        end
      end
    end
  end
  
  private
  
  def verify_backup(backup_file)
    File.open(backup_file, 'rb') do |file|
      Zlib::GzipReader.wrap(file) do |gz|
        # Read first few bytes to verify decompression works
        header = gz.read(100)
        raise "Invalid backup file" unless header&.include?("PostgreSQL")
      end
    end
  end
end

Common Pitfalls

Character encoding issues arise when compressing text data without proper encoding handling. GZip operates on bytes, requiring explicit encoding specification for text data to prevent corruption.

# Incorrect: Encoding ignored during compression
def compress_text_wrong(text)
  io = StringIO.new
  Zlib::GzipWriter.wrap(io) do |gz|
    gz.write(text) # May use wrong encoding
  end
  io.string
end

# Correct: Explicit encoding handling
def compress_text_correct(text, encoding: 'UTF-8')
  io = StringIO.new
  Zlib::GzipWriter.wrap(io) do |gz|
    gz.write(text.encode(encoding))
  end
  io.string
end

# Test with non-ASCII text
text = "Héllo Wörld! 🌍"
compressed = compress_text_correct(text)

# Decompression with encoding restoration
def decompress_text(compressed_data, encoding: 'UTF-8')
  io = StringIO.new(compressed_data)
  Zlib::GzipReader.wrap(io) do |gz|
    gz.read.force_encoding(encoding)
  end
end

decompressed = decompress_text(compressed)
puts decompressed == text # => true

Resource leaks occur when GZip streams remain unclosed, particularly in error conditions. Ruby's garbage collector cannot immediately release compressed file handles, causing resource exhaustion.

# Problematic: Manual resource management
def risky_compression(files)
  files.each do |filename|
    gz = Zlib::GzipWriter.new(File.open("#{filename}.gz", 'wb'))
    gz.write(File.read(filename))
    # Missing gz.close - resource leak!
  end
end

# Safe: Block-based resource management
def safe_compression(files)
  files.each do |filename|
    File.open("#{filename}.gz", 'wb') do |file|
      Zlib::GzipWriter.wrap(file) do |gz|
        gz.write(File.read(filename))
        # Automatic cleanup via block
      end
    end
  end
end

# Safest: Exception-safe resource cleanup
def robust_compression(files)
  files.each do |filename|
    output_file = nil
    gz = nil
    
    begin
      output_file = File.open("#{filename}.gz", 'wb')
      gz = Zlib::GzipWriter.new(output_file)
      gz.write(File.read(filename))
    ensure
      gz&.close
      output_file&.close
    end
  end
end

Compression level misconceptions lead to inappropriate settings for specific use cases. Maximum compression (level 9) rarely provides significant benefits over moderate levels while consuming substantially more CPU resources.

require 'benchmark'

def demonstrate_compression_tradeoffs(data)
  levels = [1, 6, 9]
  results = {}
  
  levels.each do |level|
    time = Benchmark.realtime do
      io = StringIO.new
      Zlib::GzipWriter.wrap(io, level) do |gz|
        gz.write(data)
      end
      results[level] = {
        size: io.string.bytesize,
        ratio: (io.string.bytesize.to_f / data.bytesize * 100).round(1)
      }
    end
    results[level][:time] = time.round(4)
  end
  
  puts "Compression Analysis for #{data.bytesize} byte input:"
  results.each do |level, stats|
    puts "Level #{level}: #{stats[:size]} bytes (#{stats[:ratio]}%) in #{stats[:time]}s"
  end
  
  # Calculate efficiency (compression per second)
  results.each do |level, stats|
    efficiency = ((100 - stats[:ratio]) / stats[:time]).round(1)
    puts "Level #{level} efficiency: #{efficiency} compression points/second"
  end
end

# Test with different data types
text_data = File.read('/usr/share/dict/words') rescue "word " * 10000
demonstrate_compression_tradeoffs(text_data)

Memory accumulation during streaming operations occurs when developers buffer entire streams instead of processing data in chunks. This pattern defeats the memory benefits of streaming compression.

# Memory-intensive: Accumulates entire stream
def bad_streaming_decompress(filename)
  result = String.new
  
  File.open(filename, 'rb') do |file|
    Zlib::GzipReader.wrap(file) do |gz|
      # Loads entire file into memory
      result = gz.read 
    end
  end
  
  result
end

# Memory-efficient: Processes chunks
def good_streaming_decompress(filename, &block)
  File.open(filename, 'rb') do |file|
    Zlib::GzipReader.wrap(file) do |gz|
      while chunk = gz.read(64 * 1024)
        yield chunk
      end
    end
  end
end

# Example usage with memory monitoring
def process_large_gzip_file(filename)
  total_processed = 0
  
  good_streaming_decompress(filename) do |chunk|
    # Process chunk without storing
    total_processed += chunk.bytesize
    
    # Periodic progress reporting
    if total_processed % (1024 * 1024) == 0
      puts "Processed #{total_processed / 1024 / 1024}MB"
    end
  end
end

Reference

Core Classes

Class	Purpose	Key Methods
`Zlib::GzipWriter`	Compress data to GZip format	`#write`, `#puts`, `#close`, `#flush`
`Zlib::GzipReader`	Decompress GZip format data	`#read`, `#gets`, `#each_line`, `#rewind`
`Zlib::GzipFile`	Base class for GZip operations	`#close`, `#closed?`, `#sync`, `#sync=`

GzipWriter Methods

Method	Parameters	Returns	Description
`#initialize(io, level=nil, strategy=nil)`	IO object, compression level (0-9), strategy	GzipWriter	Creates new writer with optional compression settings
`#write(string)`	String data	Integer	Writes string to compressed stream, returns bytes written
`#puts(*objects)`	Objects to write	nil	Writes objects as lines with newline separators
`#print(*objects)`	Objects to write	nil	Writes objects to stream without separators
`#printf(format, *objects)`	Format string, objects	nil	Writes formatted output to stream
`#flush(flush=nil)`	Flush mode constant	self	Flushes pending data to underlying IO
`#close`	None	nil	Closes writer and finalizes compression
`#orig_name=`	String filename	String	Sets original filename metadata
`#comment=`	String comment	String	Sets comment metadata
`#mtime=`	Time object	Time	Sets modification time metadata

GzipReader Methods

Method	Parameters	Returns	Description
`#initialize(io)`	IO object	GzipReader	Creates new reader for compressed stream
`#read(length=nil)`	Bytes to read	String/nil	Reads decompressed data, nil at EOF
`#gets(separator=$/)`	Line separator	String/nil	Reads line from decompressed stream
`#each_line(&block)`	Block	Enumerator	Iterates over lines in decompressed data
`#readlines(separator=$/)`	Line separator	Array	Returns all lines as array
`#rewind`	None	0	Resets reader to beginning of stream
`#pos`	None	Integer	Returns current position in decompressed data
`#eof?`	None	Boolean	Returns true if at end of file
`#orig_name`	None	String/nil	Returns original filename from metadata
`#comment`	None	String/nil	Returns comment from metadata
`#mtime`	None	Time	Returns modification time from metadata

Compression Levels

Level	Speed	Ratio	Use Case
0	Fastest	None	Storage without compression
1-3	Fast	Low	Real-time compression, CPU-limited
4-6	Balanced	Medium	General purpose, default (6)
7-9	Slow	High	Archival, bandwidth-limited

Exception Hierarchy

Exception	Parent	Description
`Zlib::Error`	StandardError	Base class for all Zlib errors
`Zlib::GzipFile::Error`	Zlib::Error	GZip format or operation errors
`Zlib::GzipFile::NoFooter`	GzipFile::Error	Missing or invalid file footer
`Zlib::GzipFile::CRCError`	GzipFile::Error	Checksum validation failure
`Zlib::GzipFile::LengthError`	GzipFile::Error	Length validation failure
`Zlib::DataError`	Zlib::Error	Corrupted or invalid compressed data
`Zlib::BufError`	Zlib::Error	Buffer size or state errors
`Zlib::StreamError`	Zlib::Error	Stream state or operation errors

File Format Constants

Constant	Value	Description
`Zlib::GzipFile::SYNC_FLUSH`	2	Flush mode for real-time streaming
`Zlib::GzipFile::FULL_FLUSH`	3	Complete buffer flush
`Zlib::BEST_SPEED`	1	Fastest compression
`Zlib::BEST_COMPRESSION`	9	Maximum compression
`Zlib::DEFAULT_COMPRESSION`	-1	Default compression level (6)

Convenience Methods

Method	Parameters	Returns	Description
`Zlib::GzipWriter.open(filename, level=nil)`	Filename, compression level	GzipWriter	Opens file for writing with automatic close
`Zlib::GzipReader.open(filename)`	Filename	GzipReader	Opens file for reading with automatic close
`Zlib::GzipFile.wrap(io, &block)`	IO object, block	Block result	Wraps IO with appropriate GZip class
`Zlib.gzip(string, level=nil)`	String data, compression level	String	Compresses string to GZip bytes
`Zlib.gunzip(gzipped_string)`	GZip bytes	String	Decompresses GZip bytes to string