Overview
Ruby provides GZip compression and decompression through the Zlib library, specifically the Zlib::GzipReader
and Zlib::GzipWriter
classes. These classes implement the RFC 1952 GZip file format, allowing Ruby applications to create and read compressed data compatible with standard GZip tools.
The GZip implementation operates on IO objects, supporting both file-based and in-memory compression operations. Ruby's GZip classes handle the complete format specification including headers, compression metadata, and checksums automatically.
require 'zlib'
# Basic file compression
Zlib::GzipFile.wrap(File.open('data.txt.gz', 'wb')) do |gz|
gz.write("Hello, compressed world!")
end
# Basic file decompression
Zlib::GzipFile.wrap(File.open('data.txt.gz', 'rb')) do |gz|
puts gz.read
end
# => "Hello, compressed world!"
The Zlib::GzipWriter
class compresses data as it writes, while Zlib::GzipReader
decompresses data during reading. Both classes maintain compatibility with external GZip files and provide access to compression metadata including modification times, original filenames, and comments.
Ruby's GZip implementation supports all standard compression levels from 0 (no compression) through 9 (maximum compression), with level 6 as the default. The classes also provide streaming capabilities for processing large datasets without loading entire files into memory.
Basic Usage
File compression with Zlib::GzipWriter
requires opening a binary write stream and writing data through the GZip wrapper. The writer automatically handles format headers, compression algorithms, and file checksums.
require 'zlib'
# Compress text file
File.open('large_data.txt.gz', 'wb') do |file|
Zlib::GzipWriter.wrap(file) do |gz|
gz.write("Line 1: Important data\n")
gz.write("Line 2: More information\n")
gz.write("Line 3: Final content\n")
end
end
Reading compressed files uses Zlib::GzipReader
with similar IO patterns. The reader transparently decompresses data and provides standard IO methods including read
, gets
, and each_line
.
# Decompress and process file
File.open('large_data.txt.gz', 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gz|
gz.each_line do |line|
puts "Processed: #{line.chomp}"
end
end
end
# => Processed: Line 1: Important data
# => Processed: Line 2: More information
# => Processed: Line 3: Final content
String compression operates through StringIO
objects, allowing in-memory compression without temporary files. This approach suits small to medium datasets and network operations.
require 'stringio'
# Compress string to bytes
def compress_string(data)
io = StringIO.new
Zlib::GzipWriter.wrap(io) do |gz|
gz.write(data)
end
io.string
end
# Decompress bytes to string
def decompress_string(compressed_data)
io = StringIO.new(compressed_data)
Zlib::GzipReader.wrap(io) do |gz|
gz.read
end
end
original = "This text will be compressed using GZip"
compressed = compress_string(original)
decompressed = decompress_string(compressed)
puts "Original size: #{original.bytesize} bytes"
puts "Compressed size: #{compressed.bytesize} bytes"
puts "Decompressed: #{decompressed}"
# => Original size: 38 bytes
# => Compressed size: 56 bytes
# => Decompressed: This text will be compressed using GZip
The GZip format includes metadata storage for original filenames, modification times, and comments. Ruby provides access to this information through reader properties and allows setting metadata during compression.
# Set metadata during compression
File.open('archive.txt.gz', 'wb') do |file|
Zlib::GzipWriter.wrap(file) do |gz|
gz.orig_name = "original_file.txt"
gz.comment = "Archived on #{Time.now}"
gz.mtime = Time.now
gz.write("File contents with metadata")
end
end
# Read metadata from compressed file
File.open('archive.txt.gz', 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gz|
puts "Original name: #{gz.orig_name}"
puts "Comment: #{gz.comment}"
puts "Modified: #{gz.mtime}"
puts "Content: #{gz.read}"
end
end
Error Handling & Debugging
GZip operations raise specific exceptions for different failure conditions. The Zlib::Error
hierarchy provides detailed error information for debugging compression issues.
require 'zlib'
def safe_decompress(filename)
File.open(filename, 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gz|
return gz.read
end
end
rescue Zlib::GzipFile::Error => e
puts "GZip format error: #{e.message}"
nil
rescue Zlib::DataError => e
puts "Corrupted data: #{e.message}"
nil
rescue Zlib::BufError => e
puts "Buffer error: #{e.message}"
nil
rescue Errno::ENOENT
puts "File not found: #{filename}"
nil
end
# Test with corrupted file
File.write('broken.gz', 'not gzip data')
result = safe_decompress('broken.gz')
# => GZip format error: not in gzip format
Validating GZip files before processing prevents application crashes and provides user feedback for invalid data. Ruby's GZip implementation performs header validation automatically but requires explicit error handling.
def validate_gzip_file(filename)
File.open(filename, 'rb') do |file|
# Check magic number (first 2 bytes)
magic = file.read(2)
return false if magic != "\x1f\x8b"
file.rewind
Zlib::GzipReader.wrap(file) do |gz|
# Attempt to read first byte to validate format
gz.readchar
return true
end
end
rescue Zlib::Error, EOFError
false
rescue Errno::ENOENT
false
end
# Create test files
File.write('valid.gz', Zlib::Deflate.deflate('test'))
File.write('invalid.gz', 'random data')
puts validate_gzip_file('valid.gz') # => false (deflate != gzip)
puts validate_gzip_file('invalid.gz') # => false
Memory exhaustion during decompression of maliciously crafted files requires defensive programming. Implementing size limits and streaming approaches prevents denial-of-service attacks.
def safe_decompress_with_limit(filename, max_size: 10 * 1024 * 1024)
decompressed_size = 0
result = String.new
File.open(filename, 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gz|
while chunk = gz.read(8192) # Read in chunks
decompressed_size += chunk.bytesize
if decompressed_size > max_size
raise "Decompressed size exceeds limit: #{max_size} bytes"
end
result << chunk
end
end
end
result
rescue Zlib::Error => e
raise "Decompression failed: #{e.message}"
end
Performance & Memory
Compression levels significantly impact both processing time and output size. Higher compression levels require more CPU cycles but produce smaller files, creating trade-offs for different use cases.
require 'benchmark'
data = "x" * 100_000 # 100KB of repeated character
Benchmark.bm(15) do |x|
(0..9).each do |level|
x.report("Level #{level}:") do
io = StringIO.new
Zlib::GzipWriter.wrap(io, level) do |gz|
gz.write(data)
end
compressed_size = io.string.bytesize
ratio = (compressed_size.to_f / data.bytesize * 100).round(1)
print " #{compressed_size} bytes (#{ratio}%)"
end
end
end
Streaming large files prevents memory exhaustion by processing data in chunks. Ruby's GZip classes support streaming operations that maintain constant memory usage regardless of file size.
def stream_compress_file(input_path, output_path, chunk_size: 64 * 1024)
File.open(output_path, 'wb') do |output_file|
Zlib::GzipWriter.wrap(output_file) do |gz|
File.open(input_path, 'rb') do |input_file|
while chunk = input_file.read(chunk_size)
gz.write(chunk)
end
end
end
end
end
def stream_decompress_file(input_path, output_path, chunk_size: 64 * 1024)
File.open(input_path, 'rb') do |input_file|
Zlib::GzipReader.wrap(input_file) do |gz|
File.open(output_path, 'wb') do |output_file|
while chunk = gz.read(chunk_size)
output_file.write(chunk)
end
end
end
end
end
Buffer management affects compression performance and memory usage. Ruby's GZip implementation uses internal buffers that can be tuned for specific workloads through chunk size selection.
class OptimizedGzipProcessor
def initialize(buffer_size: 32 * 1024)
@buffer_size = buffer_size
end
def compress_data(data)
io = StringIO.new
Zlib::GzipWriter.wrap(io) do |gz|
data.each_slice(@buffer_size) do |chunk|
gz.write(chunk.join)
end
end
io.string
end
def process_large_dataset(file_pattern)
Dir.glob(file_pattern).each do |filename|
compressed_name = "#{filename}.gz"
start_time = Time.now
stream_compress_file(filename, compressed_name, chunk_size: @buffer_size)
processing_time = Time.now - start_time
original_size = File.size(filename)
compressed_size = File.size(compressed_name)
ratio = (compressed_size.to_f / original_size * 100).round(1)
puts "#{filename}: #{original_size} → #{compressed_size} (#{ratio}%) in #{processing_time.round(2)}s"
end
end
end
Production Patterns
Web applications commonly use GZip compression for HTTP response compression. Ruby web frameworks and Rack middleware provide integration points for transparent response compression.
# Rack middleware example
class GzipResponseMiddleware
def initialize(app, options = {})
@app = app
@min_size = options[:min_size] || 1024
@compress_types = options[:types] || %w[text/html text/css text/javascript application/json]
end
def call(env)
status, headers, body = @app.call(env)
headers = Rack::Utils::HeaderHash.new(headers)
return [status, headers, body] unless should_compress?(env, headers, body)
compressed_body = compress_response(body)
headers['Content-Encoding'] = 'gzip'
headers['Content-Length'] = compressed_body.bytesize.to_s
headers.delete('ETag') # Remove ETag as content changed
[status, headers, [compressed_body]]
end
private
def should_compress?(env, headers, body)
return false unless env['HTTP_ACCEPT_ENCODING']&.include?('gzip')
return false unless @compress_types.any? { |type| headers['Content-Type']&.start_with?(type) }
body_size = body.respond_to?(:bytesize) ? body.bytesize : body.sum(&:bytesize)
body_size >= @min_size
end
def compress_response(body)
io = StringIO.new
Zlib::GzipWriter.wrap(io) do |gz|
body.each { |chunk| gz.write(chunk) }
end
io.string
end
end
Log file compression for long-term storage requires handling active log rotation and maintaining searchability. Background processing with proper error handling prevents impact on application performance.
class LogCompressor
def initialize(log_directory, retention_days: 30)
@log_directory = log_directory
@retention_days = retention_days
end
def compress_old_logs
old_log_files.each do |log_file|
compress_log_file(log_file)
File.delete(log_file) if File.exist?("#{log_file}.gz")
end
cleanup_expired_logs
end
private
def old_log_files
pattern = File.join(@log_directory, "*.log")
Dir.glob(pattern).select do |file|
File.mtime(file) < Time.now - (24 * 60 * 60) # Older than 1 day
end
end
def compress_log_file(log_file)
compressed_file = "#{log_file}.gz"
return if File.exist?(compressed_file)
File.open(compressed_file, 'wb') do |output|
Zlib::GzipWriter.wrap(output) do |gz|
gz.orig_name = File.basename(log_file)
gz.mtime = File.mtime(log_file)
File.open(log_file, 'rb') do |input|
IO.copy_stream(input, gz)
end
end
end
rescue => e
File.delete(compressed_file) if File.exist?(compressed_file)
raise "Failed to compress #{log_file}: #{e.message}"
end
def cleanup_expired_logs
cutoff_time = Time.now - (@retention_days * 24 * 60 * 60)
Dir.glob(File.join(@log_directory, "*.gz")).each do |gz_file|
File.delete(gz_file) if File.mtime(gz_file) < cutoff_time
end
end
end
Database backup compression reduces storage costs and transfer times. Streaming compression during backup creation eliminates temporary file requirements and improves performance.
class DatabaseBackupCompressor
def initialize(database_url, backup_location)
@database_url = database_url
@backup_location = backup_location
end
def create_compressed_backup
timestamp = Time.now.strftime("%Y%m%d_%H%M%S")
backup_file = File.join(@backup_location, "backup_#{timestamp}.sql.gz")
File.open(backup_file, 'wb') do |file|
Zlib::GzipWriter.wrap(file) do |gz|
gz.orig_name = "backup_#{timestamp}.sql"
gz.mtime = Time.now
# Stream pg_dump output directly to GZip
IO.popen(["pg_dump", @database_url], "rb") do |dump|
IO.copy_stream(dump, gz)
end
end
end
verify_backup(backup_file)
backup_file
end
def restore_from_backup(backup_file)
File.open(backup_file, 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gz|
IO.popen(["psql", @database_url], "wb") do |psql|
IO.copy_stream(gz, psql)
end
end
end
end
private
def verify_backup(backup_file)
File.open(backup_file, 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gz|
# Read first few bytes to verify decompression works
header = gz.read(100)
raise "Invalid backup file" unless header&.include?("PostgreSQL")
end
end
end
end
Common Pitfalls
Character encoding issues arise when compressing text data without proper encoding handling. GZip operates on bytes, requiring explicit encoding specification for text data to prevent corruption.
# Incorrect: Encoding ignored during compression
def compress_text_wrong(text)
io = StringIO.new
Zlib::GzipWriter.wrap(io) do |gz|
gz.write(text) # May use wrong encoding
end
io.string
end
# Correct: Explicit encoding handling
def compress_text_correct(text, encoding: 'UTF-8')
io = StringIO.new
Zlib::GzipWriter.wrap(io) do |gz|
gz.write(text.encode(encoding))
end
io.string
end
# Test with non-ASCII text
text = "Héllo Wörld! 🌍"
compressed = compress_text_correct(text)
# Decompression with encoding restoration
def decompress_text(compressed_data, encoding: 'UTF-8')
io = StringIO.new(compressed_data)
Zlib::GzipReader.wrap(io) do |gz|
gz.read.force_encoding(encoding)
end
end
decompressed = decompress_text(compressed)
puts decompressed == text # => true
Resource leaks occur when GZip streams remain unclosed, particularly in error conditions. Ruby's garbage collector cannot immediately release compressed file handles, causing resource exhaustion.
# Problematic: Manual resource management
def risky_compression(files)
files.each do |filename|
gz = Zlib::GzipWriter.new(File.open("#{filename}.gz", 'wb'))
gz.write(File.read(filename))
# Missing gz.close - resource leak!
end
end
# Safe: Block-based resource management
def safe_compression(files)
files.each do |filename|
File.open("#{filename}.gz", 'wb') do |file|
Zlib::GzipWriter.wrap(file) do |gz|
gz.write(File.read(filename))
# Automatic cleanup via block
end
end
end
end
# Safest: Exception-safe resource cleanup
def robust_compression(files)
files.each do |filename|
output_file = nil
gz = nil
begin
output_file = File.open("#{filename}.gz", 'wb')
gz = Zlib::GzipWriter.new(output_file)
gz.write(File.read(filename))
ensure
gz&.close
output_file&.close
end
end
end
Compression level misconceptions lead to inappropriate settings for specific use cases. Maximum compression (level 9) rarely provides significant benefits over moderate levels while consuming substantially more CPU resources.
require 'benchmark'
def demonstrate_compression_tradeoffs(data)
levels = [1, 6, 9]
results = {}
levels.each do |level|
time = Benchmark.realtime do
io = StringIO.new
Zlib::GzipWriter.wrap(io, level) do |gz|
gz.write(data)
end
results[level] = {
size: io.string.bytesize,
ratio: (io.string.bytesize.to_f / data.bytesize * 100).round(1)
}
end
results[level][:time] = time.round(4)
end
puts "Compression Analysis for #{data.bytesize} byte input:"
results.each do |level, stats|
puts "Level #{level}: #{stats[:size]} bytes (#{stats[:ratio]}%) in #{stats[:time]}s"
end
# Calculate efficiency (compression per second)
results.each do |level, stats|
efficiency = ((100 - stats[:ratio]) / stats[:time]).round(1)
puts "Level #{level} efficiency: #{efficiency} compression points/second"
end
end
# Test with different data types
text_data = File.read('/usr/share/dict/words') rescue "word " * 10000
demonstrate_compression_tradeoffs(text_data)
Memory accumulation during streaming operations occurs when developers buffer entire streams instead of processing data in chunks. This pattern defeats the memory benefits of streaming compression.
# Memory-intensive: Accumulates entire stream
def bad_streaming_decompress(filename)
result = String.new
File.open(filename, 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gz|
# Loads entire file into memory
result = gz.read
end
end
result
end
# Memory-efficient: Processes chunks
def good_streaming_decompress(filename, &block)
File.open(filename, 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gz|
while chunk = gz.read(64 * 1024)
yield chunk
end
end
end
end
# Example usage with memory monitoring
def process_large_gzip_file(filename)
total_processed = 0
good_streaming_decompress(filename) do |chunk|
# Process chunk without storing
total_processed += chunk.bytesize
# Periodic progress reporting
if total_processed % (1024 * 1024) == 0
puts "Processed #{total_processed / 1024 / 1024}MB"
end
end
end
Reference
Core Classes
Class | Purpose | Key Methods |
---|---|---|
Zlib::GzipWriter |
Compress data to GZip format | #write , #puts , #close , #flush |
Zlib::GzipReader |
Decompress GZip format data | #read , #gets , #each_line , #rewind |
Zlib::GzipFile |
Base class for GZip operations | #close , #closed? , #sync , #sync= |
GzipWriter Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#initialize(io, level=nil, strategy=nil) |
IO object, compression level (0-9), strategy | GzipWriter | Creates new writer with optional compression settings |
#write(string) |
String data | Integer | Writes string to compressed stream, returns bytes written |
#puts(*objects) |
Objects to write | nil | Writes objects as lines with newline separators |
#print(*objects) |
Objects to write | nil | Writes objects to stream without separators |
#printf(format, *objects) |
Format string, objects | nil | Writes formatted output to stream |
#flush(flush=nil) |
Flush mode constant | self | Flushes pending data to underlying IO |
#close |
None | nil | Closes writer and finalizes compression |
#orig_name= |
String filename | String | Sets original filename metadata |
#comment= |
String comment | String | Sets comment metadata |
#mtime= |
Time object | Time | Sets modification time metadata |
GzipReader Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#initialize(io) |
IO object | GzipReader | Creates new reader for compressed stream |
#read(length=nil) |
Bytes to read | String/nil | Reads decompressed data, nil at EOF |
#gets(separator=$/) |
Line separator | String/nil | Reads line from decompressed stream |
#each_line(&block) |
Block | Enumerator | Iterates over lines in decompressed data |
#readlines(separator=$/) |
Line separator | Array | Returns all lines as array |
#rewind |
None | 0 | Resets reader to beginning of stream |
#pos |
None | Integer | Returns current position in decompressed data |
#eof? |
None | Boolean | Returns true if at end of file |
#orig_name |
None | String/nil | Returns original filename from metadata |
#comment |
None | String/nil | Returns comment from metadata |
#mtime |
None | Time | Returns modification time from metadata |
Compression Levels
Level | Speed | Ratio | Use Case |
---|---|---|---|
0 | Fastest | None | Storage without compression |
1-3 | Fast | Low | Real-time compression, CPU-limited |
4-6 | Balanced | Medium | General purpose, default (6) |
7-9 | Slow | High | Archival, bandwidth-limited |
Exception Hierarchy
Exception | Parent | Description |
---|---|---|
Zlib::Error |
StandardError | Base class for all Zlib errors |
Zlib::GzipFile::Error |
Zlib::Error | GZip format or operation errors |
Zlib::GzipFile::NoFooter |
GzipFile::Error | Missing or invalid file footer |
Zlib::GzipFile::CRCError |
GzipFile::Error | Checksum validation failure |
Zlib::GzipFile::LengthError |
GzipFile::Error | Length validation failure |
Zlib::DataError |
Zlib::Error | Corrupted or invalid compressed data |
Zlib::BufError |
Zlib::Error | Buffer size or state errors |
Zlib::StreamError |
Zlib::Error | Stream state or operation errors |
File Format Constants
Constant | Value | Description |
---|---|---|
Zlib::GzipFile::SYNC_FLUSH |
2 | Flush mode for real-time streaming |
Zlib::GzipFile::FULL_FLUSH |
3 | Complete buffer flush |
Zlib::BEST_SPEED |
1 | Fastest compression |
Zlib::BEST_COMPRESSION |
9 | Maximum compression |
Zlib::DEFAULT_COMPRESSION |
-1 | Default compression level (6) |
Convenience Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Zlib::GzipWriter.open(filename, level=nil) |
Filename, compression level | GzipWriter | Opens file for writing with automatic close |
Zlib::GzipReader.open(filename) |
Filename | GzipReader | Opens file for reading with automatic close |
Zlib::GzipFile.wrap(io, &block) |
IO object, block | Block result | Wraps IO with appropriate GZip class |
Zlib.gzip(string, level=nil) |
String data, compression level | String | Compresses string to GZip bytes |
Zlib.gunzip(gzipped_string) |
GZip bytes | String | Decompresses GZip bytes to string |