Overview
Ruby's Zlib module provides access to the zlib compression library, implementing deflate compression and gzip format handling. The module wraps the underlying C zlib library and exposes Ruby classes for compression, decompression, and checksum calculations.
The primary classes include Zlib::Deflate
for compression, Zlib::Inflate
for decompression, and Zlib::GzipWriter
/Zlib::GzipReader
for gzip format operations. Ruby also provides convenient module methods Zlib.deflate
and Zlib.inflate
for simple compression tasks.
require 'zlib'
# Simple compression and decompression
data = "The quick brown fox jumps over the lazy dog" * 100
compressed = Zlib.deflate(data)
decompressed = Zlib.inflate(compressed)
puts "Original: #{data.size} bytes"
puts "Compressed: #{compressed.size} bytes"
puts "Ratio: #{(compressed.size.to_f / data.size * 100).round(1)}%"
# => Original: 4300 bytes
# => Compressed: 62 bytes
# => Ratio: 1.4%
Zlib supports multiple compression levels from 0 (no compression) to 9 (maximum compression), with level 6 as the default. The library also provides different strategies optimized for various data types.
# Different compression levels
text = File.read('large_file.txt')
fast = Zlib.deflate(text, Zlib::BEST_SPEED) # Level 1
default = Zlib.deflate(text) # Level 6
best = Zlib.deflate(text, Zlib::BEST_COMPRESSION) # Level 9
puts "Fast: #{fast.size} bytes"
puts "Default: #{default.size} bytes"
puts "Best: #{best.size} bytes"
The module handles both raw deflate streams and complete gzip files with headers and checksums. Ruby integrates zlib compression with IO objects, enabling streaming compression for large datasets without loading entire files into memory.
Basic Usage
The simplest compression operations use Zlib.deflate
and Zlib.inflate
for basic deflate compression. These methods handle complete data in memory and return compressed or decompressed strings.
require 'zlib'
# Basic compression
original = "Ruby provides excellent compression capabilities"
compressed = Zlib.deflate(original)
restored = Zlib.inflate(compressed)
puts restored == original # => true
puts "Compression ratio: #{(compressed.size.to_f / original.size * 100).round(1)}%"
For file compression, use Zlib::GzipWriter
and Zlib::GzipReader
to create and read gzip files. These classes work with file handles and provide standard IO methods.
# Compress data to gzip file
Zlib::GzipFile.open('data.gz', Zlib::GzipFile::WRONLY) do |gz|
gz.write("Line 1\n")
gz.write("Line 2\n")
gz.puts("Line 3")
end
# Read compressed file
Zlib::GzipFile.open('data.gz') do |gz|
puts gz.read
end
# => Line 1
# => Line 2
# => Line 3
Stream-based compression handles large datasets efficiently by processing data in chunks. The Zlib::Deflate
and Zlib::Inflate
classes provide streaming interfaces.
# Streaming compression
deflater = Zlib::Deflate.new
compressed_parts = []
# Compress data in chunks
['chunk1', 'chunk2', 'chunk3'].each do |chunk|
compressed_parts << deflater.deflate(chunk)
end
compressed_parts << deflater.finish
# Combine compressed data
full_compressed = compressed_parts.join
# Streaming decompression
inflater = Zlib::Inflate.new
original_parts = []
compressed_parts[0..-2].each do |part|
original_parts << inflater.inflate(part)
end
original_parts << inflater.finish
puts original_parts.join # => chunk1chunk2chunk3
The gzip format includes metadata like timestamps and original filenames. Ruby exposes this information through gzip objects.
# Writing gzip with metadata
Zlib::GzipFile.open('archive.gz', 'wb') do |gz|
gz.orig_name = 'original_file.txt'
gz.comment = 'Created by Ruby script'
gz.mtime = Time.now
gz.write(File.read('source.txt'))
end
# Reading gzip metadata
Zlib::GzipFile.open('archive.gz') do |gz|
puts "Original name: #{gz.orig_name}"
puts "Comment: #{gz.comment}"
puts "Modified time: #{gz.mtime}"
puts "Content: #{gz.read}"
end
Performance & Memory
Compression performance depends heavily on data characteristics, compression level, and available memory. Higher compression levels require more CPU time but produce smaller outputs, creating a time-space tradeoff.
require 'benchmark'
require 'zlib'
# Test data with different characteristics
random_data = Random.new.bytes(1_000_000)
text_data = ("Ruby compression test " * 50_000)
binary_data = File.binread('image.jpg')
# Benchmark different compression levels
Benchmark.bm(15) do |x|
[1, 6, 9].each do |level|
x.report("Level #{level}:") do
1000.times { Zlib.deflate(text_data, level) }
end
end
end
# Memory-efficient streaming for large files
def compress_large_file(input_path, output_path, chunk_size = 64 * 1024)
File.open(output_path, 'wb') do |output|
Zlib::GzipWriter.wrap(output) do |gz|
File.open(input_path, 'rb') do |input|
while chunk = input.read(chunk_size)
gz.write(chunk)
end
end
end
end
end
# Process 1GB file with constant memory usage
compress_large_file('huge_dataset.csv', 'compressed.gz')
Different compression strategies optimize for specific data patterns. The default strategy works well for most text, but specialized strategies improve compression for certain data types.
# Compare strategies for different data types
strategies = {
'DEFAULT' => Zlib::DEFAULT_STRATEGY,
'FILTERED' => Zlib::FILTERED,
'HUFFMAN_ONLY' => Zlib::HUFFMAN_ONLY,
'RLE' => Zlib::RLE,
'FIXED' => Zlib::FIXED
}
test_data = {
'text' => 'The quick brown fox ' * 10000,
'repetitive' => 'AAAAAAAAAA' * 10000,
'random' => Random.new.bytes(100000)
}
test_data.each do |data_type, data|
puts "\n#{data_type.upcase} DATA (#{data.size} bytes):"
strategies.each do |name, strategy|
deflater = Zlib::Deflate.new(Zlib::DEFAULT_COMPRESSION,
Zlib::MAX_WBITS,
Zlib::DEF_MEM_LEVEL,
strategy)
compressed = deflater.deflate(data, Zlib::FINISH)
ratio = (compressed.size.to_f / data.size * 100).round(1)
puts " #{name}: #{compressed.size} bytes (#{ratio}%)"
end
end
Memory usage patterns differ between streaming and batch compression. Streaming compression maintains constant memory usage regardless of input size, while batch methods load entire datasets.
# Memory-conscious compression for large datasets
class StreamingCompressor
def initialize(output_io, level = Zlib::DEFAULT_COMPRESSION)
@deflater = Zlib::Deflate.new(level)
@output = output_io
@bytes_processed = 0
end
def compress_chunk(data)
compressed = @deflater.deflate(data)
@output.write(compressed) unless compressed.empty?
@bytes_processed += data.size
end
def finish
final_chunk = @deflater.finish
@output.write(final_chunk)
@deflater.close
@bytes_processed
end
end
# Process massive file with predictable memory usage
File.open('output.deflate', 'wb') do |output|
compressor = StreamingCompressor.new(output, 6)
File.open('massive_input.txt', 'rb') do |input|
while chunk = input.read(32_768) # 32KB chunks
compressor.compress_chunk(chunk)
end
end
total_bytes = compressor.finish
puts "Processed #{total_bytes} bytes"
end
Error Handling & Debugging
Zlib operations fail for various reasons including corrupted data, incorrect formats, insufficient memory, and incomplete streams. Ruby raises specific exception types that enable targeted error handling.
require 'zlib'
# Handle compression errors
def safe_compress(data, level = 6)
Zlib.deflate(data, level)
rescue Zlib::MemError => e
puts "Insufficient memory for compression: #{e.message}"
nil
rescue Zlib::Error => e
puts "General compression error: #{e.message}"
nil
end
# Handle decompression errors with detailed diagnostics
def safe_decompress(compressed_data)
Zlib.inflate(compressed_data)
rescue Zlib::DataError => e
puts "Corrupted or invalid compressed data: #{e.message}"
puts "Data length: #{compressed_data.size} bytes"
puts "First 20 bytes: #{compressed_data[0, 20].inspect}"
nil
rescue Zlib::BufError => e
puts "Buffer error - incomplete data stream: #{e.message}"
nil
rescue Zlib::MemError => e
puts "Memory allocation failed: #{e.message}"
nil
end
# Test error handling
corrupted_data = "This is not compressed data"
result = safe_decompress(corrupted_data)
# => Corrupted or invalid compressed data: incorrect header check
# => Data length: 28 bytes
# => First 20 bytes: "This is not compress"
Streaming operations require careful error handling since partial processing may succeed before failures occur. Proper cleanup prevents resource leaks.
# Robust streaming decompression with cleanup
def decompress_stream(input_io, output_io)
inflater = Zlib::Inflate.new
bytes_processed = 0
begin
loop do
chunk = input_io.read(8192)
break if chunk.nil? || chunk.empty?
decompressed = inflater.inflate(chunk)
output_io.write(decompressed) unless decompressed.empty?
bytes_processed += decompressed.size
end
# Process any remaining data
final_chunk = inflater.finish
output_io.write(final_chunk) unless final_chunk.empty?
bytes_processed += final_chunk.size
rescue Zlib::Error => e
puts "Decompression failed after #{bytes_processed} bytes: #{e.message}"
raise
ensure
inflater.close if inflater && !inflater.closed?
end
bytes_processed
end
# Usage with error handling
begin
File.open('compressed.deflate', 'rb') do |input|
File.open('output.txt', 'wb') do |output|
bytes = decompress_stream(input, output)
puts "Successfully decompressed #{bytes} bytes"
end
end
rescue => e
puts "Operation failed: #{e.message}"
File.delete('output.txt') if File.exist?('output.txt')
end
Gzip files contain checksums that detect corruption. Ruby validates these automatically but provides access to checksum information for custom validation.
# Validate gzip integrity with custom checking
def validate_gzip_file(filepath)
errors = []
begin
Zlib::GzipFile.open(filepath) do |gz|
content = gz.read
# Check if file was properly closed
unless gz.closed?
errors << "File stream not properly closed"
end
# Verify content was read completely
if content.empty? && File.size(filepath) > 10 # Basic gzip header is ~10 bytes
errors << "No content read from non-empty file"
end
puts "File validation #{errors.empty? ? 'passed' : 'failed'}"
puts "Content size: #{content.size} bytes"
puts "Compression ratio: #{((File.size(filepath).to_f / content.size) * 100).round(1)}%"
end
rescue Zlib::GzipFile::Error => e
errors << "Gzip format error: #{e.message}"
rescue Zlib::DataError => e
errors << "Data corruption detected: #{e.message}"
rescue => e
errors << "Unexpected error: #{e.message}"
end
errors.each { |error| puts "ERROR: #{error}" }
errors.empty?
end
# Test with various file conditions
['valid.gz', 'corrupted.gz', 'truncated.gz'].each do |file|
puts "\nValidating #{file}:"
validate_gzip_file(file)
end
Production Patterns
Web applications frequently use zlib for HTTP compression, response compression, and data storage optimization. Ruby frameworks integrate zlib compression for automatic content encoding.
# HTTP response compression middleware
class GzipMiddleware
def initialize(app, options = {})
@app = app
@min_size = options[:min_size] || 1024
@compression_level = options[:level] || 6
end
def call(env)
status, headers, body = @app.call(env)
# Check if client accepts gzip
accept_encoding = env['HTTP_ACCEPT_ENCODING'] || ''
return [status, headers, body] unless accept_encoding.include?('gzip')
# Collect response body
body_content = []
body.each { |part| body_content << part }
response_string = body_content.join
# Compress if response is large enough
if response_string.size >= @min_size
compressed = Zlib.deflate(response_string, @compression_level)
headers['Content-Encoding'] = 'gzip'
headers['Content-Length'] = compressed.size.to_s
return [status, headers, [compressed]]
end
[status, headers, body]
end
end
# File storage with automatic compression
class CompressedStorage
def initialize(base_path)
@base_path = base_path
Dir.mkdir(@base_path) unless Dir.exist?(@base_path)
end
def store(key, data)
filepath = File.join(@base_path, "#{key}.gz")
Zlib::GzipFile.open(filepath, 'wb') do |gz|
gz.orig_name = key
gz.mtime = Time.now
gz.write(data)
end
{
key: key,
compressed_size: File.size(filepath),
original_size: data.size,
ratio: (File.size(filepath).to_f / data.size * 100).round(2)
}
end
def retrieve(key)
filepath = File.join(@base_path, "#{key}.gz")
return nil unless File.exist?(filepath)
Zlib::GzipFile.open(filepath, 'rb') do |gz|
{
data: gz.read,
metadata: {
original_name: gz.orig_name,
modified_time: gz.mtime,
compressed_size: File.size(filepath)
}
}
end
end
end
# Usage in production environment
storage = CompressedStorage.new('/app/compressed_data')
# Store large JSON responses
api_response = fetch_large_api_data
stats = storage.store('api_cache_20240830', api_response.to_json)
puts "Stored with #{stats[:ratio]}% compression ratio"
# Retrieve and use cached data
cached = storage.retrieve('api_cache_20240830')
if cached && cached[:metadata][:modified_time] > 1.hour.ago
data = JSON.parse(cached[:data])
puts "Using cached data from #{cached[:metadata][:modified_time]}"
end
Database storage benefits from compression for large text fields and binary data. Many applications compress content before database insertion.
# Database model with automatic compression
class CompressedDocument
attr_accessor :id, :title, :content, :compressed_size, :original_size
def self.create(title, content)
doc = new
doc.title = title
doc.content = content
doc.save
doc
end
def save
compressed_content = Zlib.deflate(@content, 9) # Maximum compression
@compressed_size = compressed_content.size
@original_size = @content.size
# Simulate database save
puts "Saving document '#{@title}'"
puts "Original size: #{@original_size} bytes"
puts "Compressed size: #{@compressed_size} bytes"
puts "Space savings: #{100 - (@compressed_size.to_f / @original_size * 100).round(1)}%"
@id = rand(10000)
true
end
def self.find(id)
# Simulate database retrieval and decompression
# In real implementation, fetch compressed_content from database
compressed_content = stored_documents[id]
return nil unless compressed_content
doc = new
doc.id = id
doc.content = Zlib.inflate(compressed_content)
doc
end
private
def self.stored_documents
@stored_documents ||= {}
end
end
# Log compression for monitoring
large_content = File.read('large_report.txt')
doc = CompressedDocument.create('Monthly Report', large_content)
# => Saving document 'Monthly Report'
# => Original size: 125430 bytes
# => Compressed size: 28934 bytes
# => Space savings: 76.9%
Common Pitfalls
Compression level selection creates a common tradeoff between speed and size. Many developers default to maximum compression without considering the performance impact on high-traffic applications.
require 'benchmark'
# Demonstrate compression level impact
large_text = File.read('sample_data.txt') * 100 # ~5MB of text
puts "Compression level comparison for #{large_text.size} bytes:"
Benchmark.bm(12) do |x|
(1..9).each do |level|
x.report("Level #{level}:") do
compressed = Zlib.deflate(large_text, level)
ratio = (compressed.size.to_f / large_text.size * 100).round(1)
puts " Size: #{compressed.size} bytes (#{ratio}%)"
end
end
end
# Good: Choose level based on use case
def choose_compression_level(use_case)
case use_case
when :real_time_response
1 # Fast compression for web responses
when :daily_backup
6 # Balanced for scheduled tasks
when :archival_storage
9 # Maximum compression for long-term storage
else
Zlib::DEFAULT_COMPRESSION
end
end
Mixing different zlib formats causes decompression failures. Raw deflate streams, zlib format (deflate + header/checksum), and gzip format are incompatible despite using the same compression algorithm.
# Common mistake: Format confusion
text = "Sample data for compression testing"
# These create different formats
raw_deflate = Zlib.deflate(text) # Raw deflate stream
zlib_format = Zlib::Deflate.deflate(text) # Zlib format (with header)
# Create gzip format
gzip_format = nil
Zlib::GzipWriter.wrap(StringIO.new) do |gz|
gz.write(text)
gzip_format = gz.finish
end
# Attempting wrong decompression fails
begin
# Wrong: Try to inflate gzip data as raw deflate
Zlib.inflate(gzip_format)
rescue Zlib::DataError => e
puts "Failed to decode gzip as deflate: #{e.message}"
end
# Correct: Use appropriate decompression method
puts "Raw deflate: #{Zlib.inflate(raw_deflate)}"
gzip_reader = Zlib::GzipReader.new(StringIO.new(gzip_format))
puts "Gzip data: #{gzip_reader.read}"
gzip_reader.close
Memory management becomes critical with large datasets. Forgetting to close deflate/inflate objects causes memory leaks in long-running applications.
# Memory leak example - DON'T DO THIS
def compress_multiple_files_badly(file_paths)
file_paths.each do |path|
deflater = Zlib::Deflate.new(6) # Creates new deflater each time
File.open(path, 'rb') do |input|
File.open("#{path}.deflate", 'wb') do |output|
while chunk = input.read(8192)
output.write(deflater.deflate(chunk))
end
output.write(deflater.finish)
end
end
# deflater is never closed - MEMORY LEAK
end
end
# Correct approach with proper cleanup
def compress_multiple_files_correctly(file_paths)
file_paths.each do |path|
deflater = nil
begin
deflater = Zlib::Deflate.new(6)
File.open(path, 'rb') do |input|
File.open("#{path}.deflate", 'wb') do |output|
while chunk = input.read(8192)
compressed = deflater.deflate(chunk)
output.write(compressed) unless compressed.empty?
end
output.write(deflater.finish)
end
end
ensure
deflater&.close # Always clean up
end
end
end
# Even better: Use blocks for automatic cleanup
def compress_files_with_blocks(file_paths)
file_paths.each do |path|
File.open("#{path}.deflate", 'wb') do |output|
Zlib::GzipWriter.wrap(output) do |gz| # Automatic cleanup
File.open(path, 'rb') do |input|
IO.copy_stream(input, gz) # Efficient copying
end
end
end
end
end
String encoding issues arise when compressing text data. Zlib operates on bytes, but Ruby strings have encoding information that affects compression results.
# Encoding affects compression
text_utf8 = "Hello, 世界! 🌍"
text_ascii = text_utf8.encode('ASCII', undef: :replace, invalid: :replace)
puts "UTF-8 encoding: #{text_utf8.encoding}"
puts "UTF-8 bytes: #{text_utf8.bytesize}"
compressed_utf8 = Zlib.deflate(text_utf8)
puts "UTF-8 compressed: #{compressed_utf8.size} bytes"
puts "\nASCII encoding: #{text_ascii.encoding}"
puts "ASCII bytes: #{text_ascii.bytesize}"
compressed_ascii = Zlib.deflate(text_ascii)
puts "ASCII compressed: #{compressed_ascii.size} bytes"
# Decompressed strings maintain original encoding
decompressed_utf8 = Zlib.inflate(compressed_utf8)
puts "Decompressed encoding: #{decompressed_utf8.encoding}"
# Best practice: Handle encoding explicitly
def compress_text_safely(text, target_encoding = 'UTF-8')
# Ensure consistent encoding
normalized = text.encode(target_encoding)
compressed = Zlib.deflate(normalized)
{
compressed: compressed,
original_encoding: target_encoding,
original_size: normalized.bytesize,
compressed_size: compressed.size
}
end
result = compress_text_safely("Mixed encoding text: café naïve résumé")
puts "\nSafe compression result: #{result}"
Reference
Core Module Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Zlib.deflate(data, level=DEFAULT) |
data (String), level (Integer) |
String |
Compress data using deflate algorithm |
Zlib.inflate(data) |
data (String) |
String |
Decompress deflate-compressed data |
Zlib.crc32(data, initial=0) |
data (String), initial (Integer) |
Integer |
Calculate CRC32 checksum |
Zlib.adler32(data, initial=1) |
data (String), initial (Integer) |
Integer |
Calculate Adler-32 checksum |
Compression Levels
Constant | Value | Description | Best For |
---|---|---|---|
NO_COMPRESSION |
0 | No compression, store only | Testing, small files |
BEST_SPEED |
1 | Fastest compression | Real-time applications |
DEFAULT_COMPRESSION |
6 | Balanced speed/size | General purpose |
BEST_COMPRESSION |
9 | Maximum compression | Archival storage |
Compression Strategies
Strategy | Description | Optimal For |
---|---|---|
DEFAULT_STRATEGY |
Standard deflate algorithm | Mixed content, text |
FILTERED |
Filtered data strategy | Images, binary data with patterns |
HUFFMAN_ONLY |
Huffman coding only | Pre-compressed data |
RLE |
Run Length Encoding | Data with long runs |
FIXED |
Fixed Huffman codes | Small data, consistent patterns |
Deflate/Inflate Classes
Method | Parameters | Returns | Description |
---|---|---|---|
Deflate.new(level, wbits, memlevel, strategy) |
All optional integers | Deflate |
Create deflate object |
#deflate(data, flush=NO_FLUSH) |
data (String), flush (Integer) |
String |
Compress data chunk |
#finish |
None | String |
Complete compression, return final data |
#close |
None | nil |
Close deflate object, free memory |
Inflate.new(wbits) |
wbits (Integer, optional) |
Inflate |
Create inflate object |
#inflate(data) |
data (String) |
String |
Decompress data chunk |
#finish |
None | String |
Complete decompression |
#close |
None | nil |
Close inflate object |
Gzip Classes
Method | Parameters | Returns | Description |
---|---|---|---|
GzipFile.open(filename, mode) |
filename (String), mode (String) |
GzipFile |
Open gzip file |
GzipWriter.new(io, level, strategy) |
io (IO), level (Integer), strategy (Integer) |
GzipWriter |
Create gzip writer |
GzipReader.new(io) |
io (IO) |
GzipReader |
Create gzip reader |
#orig_name |
None | String |
Get/set original filename |
#comment |
None | String |
Get/set file comment |
#mtime |
None | Time |
Get/set modification time |
#level |
None | Integer |
Get compression level used |
Exception Hierarchy
Exception | Parent | Raised When |
---|---|---|
Zlib::Error |
StandardError |
General zlib errors |
Zlib::StreamEnd |
Zlib::Error |
Stream ended prematurely |
Zlib::NeedDict |
Zlib::Error |
Dictionary required for decompression |
Zlib::DataError |
Zlib::Error |
Invalid or corrupted data |
Zlib::StreamError |
Zlib::Error |
Invalid stream state |
Zlib::MemError |
Zlib::Error |
Insufficient memory |
Zlib::BufError |
Zlib::Error |
Buffer error, incomplete data |
Zlib::VersionError |
Zlib::Error |
Version compatibility issue |
Zlib::GzipFile::Error |
Zlib::Error |
Gzip file format errors |
Flush Constants
Constant | Value | Usage |
---|---|---|
NO_FLUSH |
0 | Default, continue compression |
PARTIAL_FLUSH |
1 | Partial flush for chunk boundary |
SYNC_FLUSH |
2 | Synchronous flush, align byte boundary |
FULL_FLUSH |
3 | Full flush, reset compression state |
FINISH |
4 | Complete compression stream |
Window Bits (wbits) Values
Value | Format | Description |
---|---|---|
8-15 | Zlib | Deflate with zlib header/checksum |
-8 to -15 | Raw | Raw deflate stream, no header |
24-31 | Gzip | Deflate with gzip header/checksum |
40-47 | Auto | Automatic format detection |