CrackedRuby logo

CrackedRuby

Zlib

Overview

Ruby's Zlib module provides access to the zlib compression library, implementing deflate compression and gzip format handling. The module wraps the underlying C zlib library and exposes Ruby classes for compression, decompression, and checksum calculations.

The primary classes include Zlib::Deflate for compression, Zlib::Inflate for decompression, and Zlib::GzipWriter/Zlib::GzipReader for gzip format operations. Ruby also provides convenient module methods Zlib.deflate and Zlib.inflate for simple compression tasks.

require 'zlib'

# Simple compression and decompression
data = "The quick brown fox jumps over the lazy dog" * 100
compressed = Zlib.deflate(data)
decompressed = Zlib.inflate(compressed)

puts "Original: #{data.size} bytes"
puts "Compressed: #{compressed.size} bytes"
puts "Ratio: #{(compressed.size.to_f / data.size * 100).round(1)}%"
# => Original: 4300 bytes
# => Compressed: 62 bytes  
# => Ratio: 1.4%

Zlib supports multiple compression levels from 0 (no compression) to 9 (maximum compression), with level 6 as the default. The library also provides different strategies optimized for various data types.

# Different compression levels
text = File.read('large_file.txt')

fast = Zlib.deflate(text, Zlib::BEST_SPEED)      # Level 1
default = Zlib.deflate(text)                      # Level 6
best = Zlib.deflate(text, Zlib::BEST_COMPRESSION) # Level 9

puts "Fast: #{fast.size} bytes"
puts "Default: #{default.size} bytes"  
puts "Best: #{best.size} bytes"

The module handles both raw deflate streams and complete gzip files with headers and checksums. Ruby integrates zlib compression with IO objects, enabling streaming compression for large datasets without loading entire files into memory.

Basic Usage

The simplest compression operations use Zlib.deflate and Zlib.inflate for basic deflate compression. These methods handle complete data in memory and return compressed or decompressed strings.

require 'zlib'

# Basic compression
original = "Ruby provides excellent compression capabilities"
compressed = Zlib.deflate(original)
restored = Zlib.inflate(compressed)

puts restored == original  # => true
puts "Compression ratio: #{(compressed.size.to_f / original.size * 100).round(1)}%"

For file compression, use Zlib::GzipWriter and Zlib::GzipReader to create and read gzip files. These classes work with file handles and provide standard IO methods.

# Compress data to gzip file
Zlib::GzipFile.open('data.gz', Zlib::GzipFile::WRONLY) do |gz|
  gz.write("Line 1\n")
  gz.write("Line 2\n")
  gz.puts("Line 3")
end

# Read compressed file
Zlib::GzipFile.open('data.gz') do |gz|
  puts gz.read
end
# => Line 1
# => Line 2  
# => Line 3

Stream-based compression handles large datasets efficiently by processing data in chunks. The Zlib::Deflate and Zlib::Inflate classes provide streaming interfaces.

# Streaming compression
deflater = Zlib::Deflate.new
compressed_parts = []

# Compress data in chunks
['chunk1', 'chunk2', 'chunk3'].each do |chunk|
  compressed_parts << deflater.deflate(chunk)
end
compressed_parts << deflater.finish

# Combine compressed data
full_compressed = compressed_parts.join

# Streaming decompression
inflater = Zlib::Inflate.new
original_parts = []

compressed_parts[0..-2].each do |part|
  original_parts << inflater.inflate(part)
end
original_parts << inflater.finish

puts original_parts.join  # => chunk1chunk2chunk3

The gzip format includes metadata like timestamps and original filenames. Ruby exposes this information through gzip objects.

# Writing gzip with metadata
Zlib::GzipFile.open('archive.gz', 'wb') do |gz|
  gz.orig_name = 'original_file.txt'
  gz.comment = 'Created by Ruby script'
  gz.mtime = Time.now
  gz.write(File.read('source.txt'))
end

# Reading gzip metadata
Zlib::GzipFile.open('archive.gz') do |gz|
  puts "Original name: #{gz.orig_name}"
  puts "Comment: #{gz.comment}"
  puts "Modified time: #{gz.mtime}"
  puts "Content: #{gz.read}"
end

Performance & Memory

Compression performance depends heavily on data characteristics, compression level, and available memory. Higher compression levels require more CPU time but produce smaller outputs, creating a time-space tradeoff.

require 'benchmark'
require 'zlib'

# Test data with different characteristics
random_data = Random.new.bytes(1_000_000)
text_data = ("Ruby compression test " * 50_000)
binary_data = File.binread('image.jpg')

# Benchmark different compression levels
Benchmark.bm(15) do |x|
  [1, 6, 9].each do |level|
    x.report("Level #{level}:") do
      1000.times { Zlib.deflate(text_data, level) }
    end
  end
end

# Memory-efficient streaming for large files
def compress_large_file(input_path, output_path, chunk_size = 64 * 1024)
  File.open(output_path, 'wb') do |output|
    Zlib::GzipWriter.wrap(output) do |gz|
      File.open(input_path, 'rb') do |input|
        while chunk = input.read(chunk_size)
          gz.write(chunk)
        end
      end
    end
  end
end

# Process 1GB file with constant memory usage
compress_large_file('huge_dataset.csv', 'compressed.gz')

Different compression strategies optimize for specific data patterns. The default strategy works well for most text, but specialized strategies improve compression for certain data types.

# Compare strategies for different data types
strategies = {
  'DEFAULT' => Zlib::DEFAULT_STRATEGY,
  'FILTERED' => Zlib::FILTERED,
  'HUFFMAN_ONLY' => Zlib::HUFFMAN_ONLY,
  'RLE' => Zlib::RLE,
  'FIXED' => Zlib::FIXED
}

test_data = {
  'text' => 'The quick brown fox ' * 10000,
  'repetitive' => 'AAAAAAAAAA' * 10000,
  'random' => Random.new.bytes(100000)
}

test_data.each do |data_type, data|
  puts "\n#{data_type.upcase} DATA (#{data.size} bytes):"
  
  strategies.each do |name, strategy|
    deflater = Zlib::Deflate.new(Zlib::DEFAULT_COMPRESSION, 
                                Zlib::MAX_WBITS, 
                                Zlib::DEF_MEM_LEVEL, 
                                strategy)
    compressed = deflater.deflate(data, Zlib::FINISH)
    ratio = (compressed.size.to_f / data.size * 100).round(1)
    puts "  #{name}: #{compressed.size} bytes (#{ratio}%)"
  end
end

Memory usage patterns differ between streaming and batch compression. Streaming compression maintains constant memory usage regardless of input size, while batch methods load entire datasets.

# Memory-conscious compression for large datasets
class StreamingCompressor
  def initialize(output_io, level = Zlib::DEFAULT_COMPRESSION)
    @deflater = Zlib::Deflate.new(level)
    @output = output_io
    @bytes_processed = 0
  end
  
  def compress_chunk(data)
    compressed = @deflater.deflate(data)
    @output.write(compressed) unless compressed.empty?
    @bytes_processed += data.size
  end
  
  def finish
    final_chunk = @deflater.finish
    @output.write(final_chunk)
    @deflater.close
    @bytes_processed
  end
end

# Process massive file with predictable memory usage
File.open('output.deflate', 'wb') do |output|
  compressor = StreamingCompressor.new(output, 6)
  
  File.open('massive_input.txt', 'rb') do |input|
    while chunk = input.read(32_768)  # 32KB chunks
      compressor.compress_chunk(chunk)
    end
  end
  
  total_bytes = compressor.finish
  puts "Processed #{total_bytes} bytes"
end

Error Handling & Debugging

Zlib operations fail for various reasons including corrupted data, incorrect formats, insufficient memory, and incomplete streams. Ruby raises specific exception types that enable targeted error handling.

require 'zlib'

# Handle compression errors
def safe_compress(data, level = 6)
  Zlib.deflate(data, level)
rescue Zlib::MemError => e
  puts "Insufficient memory for compression: #{e.message}"
  nil
rescue Zlib::Error => e
  puts "General compression error: #{e.message}"
  nil
end

# Handle decompression errors with detailed diagnostics
def safe_decompress(compressed_data)
  Zlib.inflate(compressed_data)
rescue Zlib::DataError => e
  puts "Corrupted or invalid compressed data: #{e.message}"
  puts "Data length: #{compressed_data.size} bytes"
  puts "First 20 bytes: #{compressed_data[0, 20].inspect}"
  nil
rescue Zlib::BufError => e
  puts "Buffer error - incomplete data stream: #{e.message}"
  nil
rescue Zlib::MemError => e
  puts "Memory allocation failed: #{e.message}"
  nil
end

# Test error handling
corrupted_data = "This is not compressed data"
result = safe_decompress(corrupted_data)
# => Corrupted or invalid compressed data: incorrect header check
# => Data length: 28 bytes
# => First 20 bytes: "This is not compress"

Streaming operations require careful error handling since partial processing may succeed before failures occur. Proper cleanup prevents resource leaks.

# Robust streaming decompression with cleanup
def decompress_stream(input_io, output_io)
  inflater = Zlib::Inflate.new
  bytes_processed = 0
  
  begin
    loop do
      chunk = input_io.read(8192)
      break if chunk.nil? || chunk.empty?
      
      decompressed = inflater.inflate(chunk)
      output_io.write(decompressed) unless decompressed.empty?
      bytes_processed += decompressed.size
    end
    
    # Process any remaining data
    final_chunk = inflater.finish
    output_io.write(final_chunk) unless final_chunk.empty?
    bytes_processed += final_chunk.size
    
  rescue Zlib::Error => e
    puts "Decompression failed after #{bytes_processed} bytes: #{e.message}"
    raise
  ensure
    inflater.close if inflater && !inflater.closed?
  end
  
  bytes_processed
end

# Usage with error handling
begin
  File.open('compressed.deflate', 'rb') do |input|
    File.open('output.txt', 'wb') do |output|
      bytes = decompress_stream(input, output)
      puts "Successfully decompressed #{bytes} bytes"
    end
  end
rescue => e
  puts "Operation failed: #{e.message}"
  File.delete('output.txt') if File.exist?('output.txt')
end

Gzip files contain checksums that detect corruption. Ruby validates these automatically but provides access to checksum information for custom validation.

# Validate gzip integrity with custom checking
def validate_gzip_file(filepath)
  errors = []
  
  begin
    Zlib::GzipFile.open(filepath) do |gz|
      content = gz.read
      
      # Check if file was properly closed
      unless gz.closed?
        errors << "File stream not properly closed"
      end
      
      # Verify content was read completely
      if content.empty? && File.size(filepath) > 10  # Basic gzip header is ~10 bytes
        errors << "No content read from non-empty file"
      end
      
      puts "File validation #{errors.empty? ? 'passed' : 'failed'}"
      puts "Content size: #{content.size} bytes"
      puts "Compression ratio: #{((File.size(filepath).to_f / content.size) * 100).round(1)}%"
    end
    
  rescue Zlib::GzipFile::Error => e
    errors << "Gzip format error: #{e.message}"
  rescue Zlib::DataError => e
    errors << "Data corruption detected: #{e.message}"
  rescue => e
    errors << "Unexpected error: #{e.message}"
  end
  
  errors.each { |error| puts "ERROR: #{error}" }
  errors.empty?
end

# Test with various file conditions
['valid.gz', 'corrupted.gz', 'truncated.gz'].each do |file|
  puts "\nValidating #{file}:"
  validate_gzip_file(file)
end

Production Patterns

Web applications frequently use zlib for HTTP compression, response compression, and data storage optimization. Ruby frameworks integrate zlib compression for automatic content encoding.

# HTTP response compression middleware
class GzipMiddleware
  def initialize(app, options = {})
    @app = app
    @min_size = options[:min_size] || 1024
    @compression_level = options[:level] || 6
  end
  
  def call(env)
    status, headers, body = @app.call(env)
    
    # Check if client accepts gzip
    accept_encoding = env['HTTP_ACCEPT_ENCODING'] || ''
    return [status, headers, body] unless accept_encoding.include?('gzip')
    
    # Collect response body
    body_content = []
    body.each { |part| body_content << part }
    response_string = body_content.join
    
    # Compress if response is large enough
    if response_string.size >= @min_size
      compressed = Zlib.deflate(response_string, @compression_level)
      headers['Content-Encoding'] = 'gzip'
      headers['Content-Length'] = compressed.size.to_s
      return [status, headers, [compressed]]
    end
    
    [status, headers, body]
  end
end

# File storage with automatic compression
class CompressedStorage
  def initialize(base_path)
    @base_path = base_path
    Dir.mkdir(@base_path) unless Dir.exist?(@base_path)
  end
  
  def store(key, data)
    filepath = File.join(@base_path, "#{key}.gz")
    
    Zlib::GzipFile.open(filepath, 'wb') do |gz|
      gz.orig_name = key
      gz.mtime = Time.now
      gz.write(data)
    end
    
    {
      key: key,
      compressed_size: File.size(filepath),
      original_size: data.size,
      ratio: (File.size(filepath).to_f / data.size * 100).round(2)
    }
  end
  
  def retrieve(key)
    filepath = File.join(@base_path, "#{key}.gz")
    return nil unless File.exist?(filepath)
    
    Zlib::GzipFile.open(filepath, 'rb') do |gz|
      {
        data: gz.read,
        metadata: {
          original_name: gz.orig_name,
          modified_time: gz.mtime,
          compressed_size: File.size(filepath)
        }
      }
    end
  end
end

# Usage in production environment
storage = CompressedStorage.new('/app/compressed_data')

# Store large JSON responses
api_response = fetch_large_api_data
stats = storage.store('api_cache_20240830', api_response.to_json)
puts "Stored with #{stats[:ratio]}% compression ratio"

# Retrieve and use cached data
cached = storage.retrieve('api_cache_20240830')
if cached && cached[:metadata][:modified_time] > 1.hour.ago
  data = JSON.parse(cached[:data])
  puts "Using cached data from #{cached[:metadata][:modified_time]}"
end

Database storage benefits from compression for large text fields and binary data. Many applications compress content before database insertion.

# Database model with automatic compression
class CompressedDocument
  attr_accessor :id, :title, :content, :compressed_size, :original_size
  
  def self.create(title, content)
    doc = new
    doc.title = title
    doc.content = content
    doc.save
    doc
  end
  
  def save
    compressed_content = Zlib.deflate(@content, 9)  # Maximum compression
    @compressed_size = compressed_content.size
    @original_size = @content.size
    
    # Simulate database save
    puts "Saving document '#{@title}'"
    puts "Original size: #{@original_size} bytes"
    puts "Compressed size: #{@compressed_size} bytes"
    puts "Space savings: #{100 - (@compressed_size.to_f / @original_size * 100).round(1)}%"
    
    @id = rand(10000)
    true
  end
  
  def self.find(id)
    # Simulate database retrieval and decompression
    # In real implementation, fetch compressed_content from database
    compressed_content = stored_documents[id]
    return nil unless compressed_content
    
    doc = new
    doc.id = id
    doc.content = Zlib.inflate(compressed_content)
    doc
  end
  
  private
  
  def self.stored_documents
    @stored_documents ||= {}
  end
end

# Log compression for monitoring
large_content = File.read('large_report.txt')
doc = CompressedDocument.create('Monthly Report', large_content)
# => Saving document 'Monthly Report'
# => Original size: 125430 bytes
# => Compressed size: 28934 bytes  
# => Space savings: 76.9%

Common Pitfalls

Compression level selection creates a common tradeoff between speed and size. Many developers default to maximum compression without considering the performance impact on high-traffic applications.

require 'benchmark'

# Demonstrate compression level impact
large_text = File.read('sample_data.txt') * 100  # ~5MB of text

puts "Compression level comparison for #{large_text.size} bytes:"
Benchmark.bm(12) do |x|
  (1..9).each do |level|
    x.report("Level #{level}:") do
      compressed = Zlib.deflate(large_text, level)
      ratio = (compressed.size.to_f / large_text.size * 100).round(1)
      puts "  Size: #{compressed.size} bytes (#{ratio}%)"
    end
  end
end

# Good: Choose level based on use case
def choose_compression_level(use_case)
  case use_case
  when :real_time_response
    1  # Fast compression for web responses
  when :daily_backup
    6  # Balanced for scheduled tasks
  when :archival_storage
    9  # Maximum compression for long-term storage
  else
    Zlib::DEFAULT_COMPRESSION
  end
end

Mixing different zlib formats causes decompression failures. Raw deflate streams, zlib format (deflate + header/checksum), and gzip format are incompatible despite using the same compression algorithm.

# Common mistake: Format confusion
text = "Sample data for compression testing"

# These create different formats
raw_deflate = Zlib.deflate(text)           # Raw deflate stream
zlib_format = Zlib::Deflate.deflate(text)  # Zlib format (with header)

# Create gzip format
gzip_format = nil
Zlib::GzipWriter.wrap(StringIO.new) do |gz|
  gz.write(text)
  gzip_format = gz.finish
end

# Attempting wrong decompression fails
begin
  # Wrong: Try to inflate gzip data as raw deflate
  Zlib.inflate(gzip_format)
rescue Zlib::DataError => e
  puts "Failed to decode gzip as deflate: #{e.message}"
end

# Correct: Use appropriate decompression method
puts "Raw deflate: #{Zlib.inflate(raw_deflate)}"

gzip_reader = Zlib::GzipReader.new(StringIO.new(gzip_format))
puts "Gzip data: #{gzip_reader.read}"
gzip_reader.close

Memory management becomes critical with large datasets. Forgetting to close deflate/inflate objects causes memory leaks in long-running applications.

# Memory leak example - DON'T DO THIS
def compress_multiple_files_badly(file_paths)
  file_paths.each do |path|
    deflater = Zlib::Deflate.new(6)  # Creates new deflater each time
    
    File.open(path, 'rb') do |input|
      File.open("#{path}.deflate", 'wb') do |output|
        while chunk = input.read(8192)
          output.write(deflater.deflate(chunk))
        end
        output.write(deflater.finish)
      end
    end
    # deflater is never closed - MEMORY LEAK
  end
end

# Correct approach with proper cleanup
def compress_multiple_files_correctly(file_paths)
  file_paths.each do |path|
    deflater = nil
    
    begin
      deflater = Zlib::Deflate.new(6)
      
      File.open(path, 'rb') do |input|
        File.open("#{path}.deflate", 'wb') do |output|
          while chunk = input.read(8192)
            compressed = deflater.deflate(chunk)
            output.write(compressed) unless compressed.empty?
          end
          output.write(deflater.finish)
        end
      end
      
    ensure
      deflater&.close  # Always clean up
    end
  end
end

# Even better: Use blocks for automatic cleanup
def compress_files_with_blocks(file_paths)
  file_paths.each do |path|
    File.open("#{path}.deflate", 'wb') do |output|
      Zlib::GzipWriter.wrap(output) do |gz|  # Automatic cleanup
        File.open(path, 'rb') do |input|
          IO.copy_stream(input, gz)  # Efficient copying
        end
      end
    end
  end
end

String encoding issues arise when compressing text data. Zlib operates on bytes, but Ruby strings have encoding information that affects compression results.

# Encoding affects compression
text_utf8 = "Hello, 世界! 🌍"
text_ascii = text_utf8.encode('ASCII', undef: :replace, invalid: :replace)

puts "UTF-8 encoding: #{text_utf8.encoding}"
puts "UTF-8 bytes: #{text_utf8.bytesize}"
compressed_utf8 = Zlib.deflate(text_utf8)
puts "UTF-8 compressed: #{compressed_utf8.size} bytes"

puts "\nASCII encoding: #{text_ascii.encoding}"  
puts "ASCII bytes: #{text_ascii.bytesize}"
compressed_ascii = Zlib.deflate(text_ascii)
puts "ASCII compressed: #{compressed_ascii.size} bytes"

# Decompressed strings maintain original encoding
decompressed_utf8 = Zlib.inflate(compressed_utf8)
puts "Decompressed encoding: #{decompressed_utf8.encoding}"

# Best practice: Handle encoding explicitly
def compress_text_safely(text, target_encoding = 'UTF-8')
  # Ensure consistent encoding
  normalized = text.encode(target_encoding)
  compressed = Zlib.deflate(normalized)
  
  {
    compressed: compressed,
    original_encoding: target_encoding,
    original_size: normalized.bytesize,
    compressed_size: compressed.size
  }
end

result = compress_text_safely("Mixed encoding text: café naïve résumé")
puts "\nSafe compression result: #{result}"

Reference

Core Module Methods

Method Parameters Returns Description
Zlib.deflate(data, level=DEFAULT) data (String), level (Integer) String Compress data using deflate algorithm
Zlib.inflate(data) data (String) String Decompress deflate-compressed data
Zlib.crc32(data, initial=0) data (String), initial (Integer) Integer Calculate CRC32 checksum
Zlib.adler32(data, initial=1) data (String), initial (Integer) Integer Calculate Adler-32 checksum

Compression Levels

Constant Value Description Best For
NO_COMPRESSION 0 No compression, store only Testing, small files
BEST_SPEED 1 Fastest compression Real-time applications
DEFAULT_COMPRESSION 6 Balanced speed/size General purpose
BEST_COMPRESSION 9 Maximum compression Archival storage

Compression Strategies

Strategy Description Optimal For
DEFAULT_STRATEGY Standard deflate algorithm Mixed content, text
FILTERED Filtered data strategy Images, binary data with patterns
HUFFMAN_ONLY Huffman coding only Pre-compressed data
RLE Run Length Encoding Data with long runs
FIXED Fixed Huffman codes Small data, consistent patterns

Deflate/Inflate Classes

Method Parameters Returns Description
Deflate.new(level, wbits, memlevel, strategy) All optional integers Deflate Create deflate object
#deflate(data, flush=NO_FLUSH) data (String), flush (Integer) String Compress data chunk
#finish None String Complete compression, return final data
#close None nil Close deflate object, free memory
Inflate.new(wbits) wbits (Integer, optional) Inflate Create inflate object
#inflate(data) data (String) String Decompress data chunk
#finish None String Complete decompression
#close None nil Close inflate object

Gzip Classes

Method Parameters Returns Description
GzipFile.open(filename, mode) filename (String), mode (String) GzipFile Open gzip file
GzipWriter.new(io, level, strategy) io (IO), level (Integer), strategy (Integer) GzipWriter Create gzip writer
GzipReader.new(io) io (IO) GzipReader Create gzip reader
#orig_name None String Get/set original filename
#comment None String Get/set file comment
#mtime None Time Get/set modification time
#level None Integer Get compression level used

Exception Hierarchy

Exception Parent Raised When
Zlib::Error StandardError General zlib errors
Zlib::StreamEnd Zlib::Error Stream ended prematurely
Zlib::NeedDict Zlib::Error Dictionary required for decompression
Zlib::DataError Zlib::Error Invalid or corrupted data
Zlib::StreamError Zlib::Error Invalid stream state
Zlib::MemError Zlib::Error Insufficient memory
Zlib::BufError Zlib::Error Buffer error, incomplete data
Zlib::VersionError Zlib::Error Version compatibility issue
Zlib::GzipFile::Error Zlib::Error Gzip file format errors

Flush Constants

Constant Value Usage
NO_FLUSH 0 Default, continue compression
PARTIAL_FLUSH 1 Partial flush for chunk boundary
SYNC_FLUSH 2 Synchronous flush, align byte boundary
FULL_FLUSH 3 Full flush, reset compression state
FINISH 4 Complete compression stream

Window Bits (wbits) Values

Value Format Description
8-15 Zlib Deflate with zlib header/checksum
-8 to -15 Raw Raw deflate stream, no header
24-31 Gzip Deflate with gzip header/checksum
40-47 Auto Automatic format detection