Overview
String compression transforms input data into a smaller representation while preserving the ability to reconstruct the original string. The process identifies redundancy in data and replaces repeated patterns with shorter codes or references. Compression algorithms balance three competing factors: compression ratio, processing speed, and memory requirements.
Two fundamental categories define compression approaches. Lossless compression guarantees perfect reconstruction of the original string, making it suitable for text, source code, and data where accuracy matters. Lossy compression sacrifices some information for higher compression ratios, primarily used for media files where minor quality degradation is acceptable. String compression typically refers to lossless methods since text data requires exact reproduction.
The effectiveness of compression depends on input characteristics. Highly repetitive data compresses well, while random or already-compressed data may resist further reduction. Some compression attempts produce output larger than the input due to metadata overhead.
require 'zlib'
original = "AAAAAABBBBBCCCCCDDDDD"
compressed = Zlib::Deflate.deflate(original)
puts "Original size: #{original.bytesize} bytes"
puts "Compressed size: #{compressed.bytesize} bytes"
puts "Ratio: #{((1 - compressed.bytesize.to_f / original.bytesize) * 100).round(2)}%"
# Original size: 21 bytes
# Compressed size: 19 bytes
# Ratio: 9.52%
Compression finds application across software development domains. Web servers compress HTTP responses to reduce bandwidth consumption. Databases compress stored records to maximize capacity. Version control systems store file deltas in compressed formats. Log aggregation systems compress historical data for cost-effective retention.
Key Principles
Entropy quantifies the information content in data. High-entropy data contains less redundancy and compresses poorly, while low-entropy data with repeated patterns achieves better compression. Claude Shannon's information theory established that no lossless compression algorithm can compress all possible inputs, as this would violate the pigeonhole principle.
Compression ratio measures effectiveness as the size relationship between compressed and original data. A ratio of 0.5 indicates the compressed version occupies half the original size. The formula compression_ratio = compressed_size / original_size produces values between 0 and 1 for successful compression, though values exceeding 1 indicate expansion rather than compression.
Statistical methods form the foundation of many compression algorithms. These approaches analyze symbol frequency in the input and assign shorter codes to frequently occurring symbols. Huffman coding creates an optimal prefix-free code based on symbol frequencies, ensuring no code is a prefix of another code, which eliminates ambiguity during decompoding.
Dictionary-based compression maintains a mapping between patterns and codes. LZ77 and LZ78 algorithms build dictionaries dynamically during compression, identifying repeated sequences and replacing subsequent occurrences with references to earlier positions. This approach works effectively on text with recurring phrases or structural patterns.
Run-length encoding (RLE) represents the simplest compression technique. It replaces consecutive identical symbols with a count-symbol pair. The string "AAAAA" becomes "5A" in RLE notation. This method excels with data containing long runs of identical values but expands data with few repetitions.
def calculate_entropy(string)
frequencies = string.chars.each_with_object(Hash.new(0)) { |char, hash| hash[char] += 1 }
length = string.length.to_f
entropy = frequencies.values.reduce(0) do |sum, freq|
probability = freq / length
sum - (probability * Math.log2(probability))
end
entropy.round(4)
end
high_entropy = "a1b2c3d4e5f6g7h8"
low_entropy = "aaaaaabbbbbbcccccc"
puts "High entropy: #{calculate_entropy(high_entropy)} bits per symbol"
puts "Low entropy: #{calculate_entropy(low_entropy)} bits per symbol"
# High entropy: 4.0 bits per symbol
# Low entropy: 1.585 bits per symbol
The compression-decompression cycle must maintain symmetry. Compression encodes data using an algorithm and optional dictionary or model. Decompression reverses the process, applying the inverse algorithm with the same dictionary or model. Asymmetric algorithms may optimize for either compression or decompression speed depending on the use case.
Adaptive algorithms modify their behavior based on processed data. Static algorithms use fixed tables or models established before processing begins. Adaptive methods typically achieve better compression ratios at the cost of increased complexity and processing requirements.
Ruby Implementation
Ruby provides built-in compression through the Zlib library, which implements the DEFLATE algorithm combining LZ77 and Huffman coding. The library offers both streaming and single-shot compression interfaces suitable for different scenarios.
The Zlib::Deflate class handles compression operations. The deflate method compresses a complete string in one operation, suitable for small to medium-sized data where memory consumption is not a concern. The method accepts an optional compression level from 0 (no compression) to 9 (maximum compression), defaulting to Zlib::DEFAULT_COMPRESSION.
require 'zlib'
class CompressionService
def self.compress(data, level: Zlib::DEFAULT_COMPRESSION)
Zlib::Deflate.deflate(data, level)
end
def self.decompress(data)
Zlib::Inflate.inflate(data)
end
def self.compress_with_stats(data, level: Zlib::DEFAULT_COMPRESSION)
compressed = compress(data, level: level)
{
original_size: data.bytesize,
compressed_size: compressed.bytesize,
ratio: (compressed.bytesize.to_f / data.bytesize).round(4),
savings: ((1 - compressed.bytesize.to_f / data.bytesize) * 100).round(2),
data: compressed
}
end
end
text = File.read('large_log.txt')
result = CompressionService.compress_with_stats(text, level: 9)
puts "Savings: #{result[:savings]}%"
Streaming compression handles large datasets that exceed available memory. The Zlib::Deflate instance processes data in chunks, maintaining internal state between calls. This approach enables compression of files, network streams, or generated data without loading everything into memory simultaneously.
require 'zlib'
def compress_file_streaming(input_path, output_path)
File.open(output_path, 'wb') do |output|
deflater = Zlib::Deflate.new
File.open(input_path, 'rb') do |input|
while chunk = input.read(8192)
compressed = deflater.deflate(chunk, Zlib::NO_FLUSH)
output.write(compressed) unless compressed.empty?
end
end
# Flush remaining data
output.write(deflater.finish)
deflater.close
end
end
def decompress_file_streaming(input_path, output_path)
File.open(output_path, 'wb') do |output|
inflater = Zlib::Inflate.new
File.open(input_path, 'rb') do |input|
while chunk = input.read(8192)
decompressed = inflater.inflate(chunk)
output.write(decompressed) unless decompressed.empty?
end
end
inflater.close
end
end
Custom compression algorithms implement domain-specific requirements. Run-length encoding provides a simple example demonstrating the core concepts of pattern recognition and encoding.
class RunLengthEncoder
def self.encode(string)
return "" if string.empty?
result = []
current_char = string[0]
count = 1
string.chars.each_with_index do |char, index|
next if index.zero?
if char == current_char
count += 1
else
result << "#{count}#{current_char}"
current_char = char
count = 1
end
end
result << "#{count}#{current_char}"
result.join
end
def self.decode(encoded)
encoded.scan(/(\d+)(.)/).map { |count, char| char * count.to_i }.join
end
def self.should_compress?(string)
encoded = encode(string)
encoded.bytesize < string.bytesize
end
end
data = "AAAAAABBBBBCCCCCDDDDD"
encoded = RunLengthEncoder.encode(data)
puts "Original: #{data}"
puts "Encoded: #{encoded}"
puts "Decoded: #{RunLengthEncoder.decode(encoded)}"
# Original: AAAAAABBBBBCCCCCDDDDD
# Encoded: 6A5B5C5D
# Decoded: AAAAAABBBBBCCCCCDDDDD
The StringIO class enables in-memory compression when working with string data rather than files. This approach avoids filesystem overhead and integrates cleanly with existing string-processing code.
require 'zlib'
require 'stringio'
def compress_string_io(data)
io = StringIO.new
writer = Zlib::GzipWriter.new(io)
writer.write(data)
writer.close
io.string
end
def decompress_string_io(compressed)
io = StringIO.new(compressed)
reader = Zlib::GzipReader.new(io)
decompressed = reader.read
reader.close
decompressed
end
original = "Sample text " * 100
compressed = compress_string_io(original)
restored = decompress_string_io(compressed)
puts original == restored # true
puts "Size reduction: #{((1 - compressed.bytesize.to_f / original.bytesize) * 100).round(2)}%"
Practical Examples
Text file compression demonstrates typical use cases where repeated words and structural patterns enable significant size reduction. Configuration files, source code, and documentation benefit from compression during storage and transmission.
require 'zlib'
class TextFileCompressor
attr_reader :stats
def initialize
@stats = {}
end
def compress_file(input_path, output_path, level: 9)
original_size = File.size(input_path)
Zlib::GzipWriter.open(output_path, level: level) do |gz|
File.open(input_path, 'rb') do |file|
while chunk = file.read(16384)
gz.write(chunk)
end
end
end
compressed_size = File.size(output_path)
@stats = {
original_size: original_size,
compressed_size: compressed_size,
ratio: (compressed_size.to_f / original_size).round(4),
reduction_percent: ((1 - compressed_size.to_f / original_size) * 100).round(2)
}
end
def decompress_file(input_path, output_path)
Zlib::GzipReader.open(input_path) do |gz|
File.open(output_path, 'wb') do |file|
while chunk = gz.read(16384)
file.write(chunk)
end
end
end
end
def verify_integrity(original_path, decompressed_path)
original_hash = Digest::SHA256.file(original_path).hexdigest
decompressed_hash = Digest::SHA256.file(decompressed_path).hexdigest
original_hash == decompressed_hash
end
end
compressor = TextFileCompressor.new
compressor.compress_file('application.log', 'application.log.gz', level: 9)
puts "Compression: #{compressor.stats[:reduction_percent]}% size reduction"
compressor.decompress_file('application.log.gz', 'application_restored.log')
puts "Integrity: #{compressor.verify_integrity('application.log', 'application_restored.log')}"
JSON compression addresses the verbosity inherent in JSON format. The structured nature with repeated keys and hierarchical nesting provides opportunities for compression, particularly with arrays of similar objects.
require 'zlib'
require 'json'
class JsonCompressor
def self.compress(hash, level: 9)
json_string = JSON.generate(hash)
Zlib::Deflate.deflate(json_string, level)
end
def self.decompress(compressed_data)
json_string = Zlib::Inflate.inflate(compressed_data)
JSON.parse(json_string)
end
def self.compare_formats(data)
json_string = JSON.generate(data)
compressed = compress(data)
{
json_size: json_string.bytesize,
compressed_size: compressed.bytesize,
reduction: ((1 - compressed.bytesize.to_f / json_string.bytesize) * 100).round(2)
}
end
end
# Large dataset with repetitive structure
users = 1000.times.map do |i|
{
id: i,
username: "user_#{i}",
email: "user#{i}@example.com",
status: "active",
role: "member",
created_at: "2024-01-01T00:00:00Z"
}
end
stats = JsonCompressor.compare_formats(users)
puts "JSON size: #{stats[:json_size]} bytes"
puts "Compressed size: #{stats[:compressed_size]} bytes"
puts "Reduction: #{stats[:reduction]}%"
HTTP response compression reduces bandwidth and improves response times for web applications. Rack middleware handles compression transparently for supported clients.
require 'zlib'
require 'rack'
class CompressionMiddleware
COMPRESSIBLE_TYPES = [
'text/html', 'text/plain', 'text/css', 'text/javascript',
'application/json', 'application/xml'
].freeze
MIN_SIZE = 1024 # Don't compress responses smaller than 1KB
def initialize(app)
@app = app
end
def call(env)
status, headers, body = @app.call(env)
return [status, headers, body] unless should_compress?(env, headers, body)
compressed_body = compress_body(body)
headers['Content-Encoding'] = 'gzip'
headers['Content-Length'] = compressed_body.bytesize.to_s
headers.delete('Content-MD5')
[status, headers, [compressed_body]]
end
private
def should_compress?(env, headers, body)
return false unless env['HTTP_ACCEPT_ENCODING']&.include?('gzip')
return false if headers['Content-Encoding']
content_type = headers['Content-Type']&.split(';')&.first
return false unless COMPRESSIBLE_TYPES.include?(content_type)
body_size = body.respond_to?(:bytesize) ? body.bytesize : body.join.bytesize
body_size >= MIN_SIZE
end
def compress_body(body)
io = StringIO.new
gz = Zlib::GzipWriter.new(io)
if body.respond_to?(:each)
body.each { |chunk| gz.write(chunk) }
else
gz.write(body)
end
gz.close
io.string
end
end
Database column compression reduces storage requirements for text-heavy fields. This pattern applies to log entries, comments, descriptions, or any large text data stored in databases.
require 'zlib'
require 'base64'
module CompressibleAttribute
def self.included(base)
base.extend(ClassMethods)
end
module ClassMethods
def compresses(*attributes)
attributes.each do |attribute|
define_method(attribute) do
compressed = super()
return nil if compressed.nil?
decoded = Base64.strict_decode64(compressed)
Zlib::Inflate.inflate(decoded)
end
define_method("#{attribute}=") do |value|
return super(nil) if value.nil?
compressed = Zlib::Deflate.deflate(value, 9)
encoded = Base64.strict_encode64(compressed)
super(encoded)
end
end
end
end
end
class Article
include CompressibleAttribute
attr_accessor :content_compressed
compresses :content
alias_method :content_compressed, :content_compressed
alias_method :content_compressed=, :content_compressed=
end
article = Article.new
article.content = "Long article content " * 500
puts "Stored size: #{article.content_compressed.bytesize} bytes"
puts "Original size: #{article.content.bytesize} bytes"
Common Patterns
Run-length encoding suits data with long sequences of identical values. Image formats use RLE for simple graphics, and fax machines employ it for document transmission. The algorithm scans the input sequentially, counting consecutive occurrences of each symbol.
class AdvancedRLE
def self.encode(string)
return "" if string.empty?
result = []
chars = string.chars
i = 0
while i < chars.length
char = chars[i]
count = 1
# Count consecutive occurrences
while i + count < chars.length && chars[i + count] == char
count += 1
end
# Use count prefix only if beneficial
if count > 2
result << "#{count}#{char}"
elsif count == 2
result << char << char
else
result << char
end
i += count
end
result.join
end
def self.encode_binary(string)
# More efficient for binary/compressed data
result = []
i = 0
while i < string.length
char = string[i]
count = 1
while i + count < string.length &&
string[i + count] == char &&
count < 255
count += 1
end
result << [count, char.ord].pack('CC')
i += count
end
result.join
end
end
Huffman coding assigns variable-length codes based on symbol frequency. Frequent symbols receive shorter codes, while rare symbols get longer codes. This statistical approach achieves optimal prefix-free encoding for a given set of symbol probabilities.
class HuffmanNode
attr_accessor :char, :freq, :left, :right
def initialize(char, freq, left = nil, right = nil)
@char = char
@freq = freq
@left = left
@right = right
end
def leaf?
@left.nil? && @right.nil?
end
end
class HuffmanEncoder
def initialize(string)
@string = string
@codes = {}
build_tree
generate_codes(@root, "")
end
def encode
@string.chars.map { |char| @codes[char] }.join
end
def code_table
@codes
end
private
def build_tree
frequencies = @string.chars.each_with_object(Hash.new(0)) { |c, h| h[c] += 1 }
nodes = frequencies.map { |char, freq| HuffmanNode.new(char, freq) }
while nodes.length > 1
nodes.sort_by!(&:freq)
left = nodes.shift
right = nodes.shift
parent = HuffmanNode.new(nil, left.freq + right.freq, left, right)
nodes << parent
end
@root = nodes.first
end
def generate_codes(node, code)
return if node.nil?
if node.leaf?
@codes[node.char] = code.empty? ? "0" : code
return
end
generate_codes(node.left, code + "0")
generate_codes(node.right, code + "1")
end
end
text = "BCAADDDCCACACAC"
encoder = HuffmanEncoder.new(text)
encoded = encoder.encode
puts "Original: #{text} (#{text.length} chars, #{text.length * 8} bits)"
puts "Code table: #{encoder.code_table}"
puts "Encoded: #{encoded} (#{encoded.length} bits)"
puts "Compression: #{((1 - encoded.length.to_f / (text.length * 8)) * 100).round(2)}%"
Dictionary-based compression maintains a mapping between patterns and shorter references. LZ77 uses a sliding window to find repeated sequences, encoding them as (distance, length) pairs. This approach works effectively on text with recurring phrases.
class SimpleLZ77
WINDOW_SIZE = 4096
LOOKAHEAD_SIZE = 18
def self.encode(string)
result = []
pos = 0
while pos < string.length
match = find_longest_match(string, pos)
if match[:length] > 2
result << {
type: :reference,
distance: match[:distance],
length: match[:length],
next_char: string[pos + match[:length]]
}
pos += match[:length] + 1
else
result << { type: :literal, char: string[pos] }
pos += 1
end
end
result
end
def self.decode(encoded)
output = ""
encoded.each do |token|
if token[:type] == :literal
output << token[:char]
else
start = output.length - token[:distance]
token[:length].times do |i|
output << output[start + i]
end
output << token[:next_char] if token[:next_char]
end
end
output
end
private
def self.find_longest_match(string, pos)
window_start = [0, pos - WINDOW_SIZE].max
best_match = { distance: 0, length: 0 }
(window_start...pos).each do |i|
length = 0
while length < LOOKAHEAD_SIZE &&
pos + length < string.length &&
string[i + length] == string[pos + length]
length += 1
end
if length > best_match[:length]
best_match = { distance: pos - i, length: length }
end
end
best_match
end
end
text = "ABCABCABCABCXYZ"
encoded = SimpleLZ77.encode(text)
decoded = SimpleLZ77.decode(encoded)
puts "Original: #{text}"
puts "Tokens: #{encoded.length}"
puts "Decoded: #{decoded}"
puts "Match: #{text == decoded}"
Delta encoding stores differences between consecutive values rather than absolute values. This pattern suits time-series data, incremental backups, or any sequence where values change gradually.
class DeltaEncoder
def self.encode(values)
return [] if values.empty?
result = [values.first]
(1...values.length).each do |i|
delta = values[i] - values[i - 1]
result << delta
end
result
end
def self.decode(deltas)
return [] if deltas.empty?
result = [deltas.first]
(1...deltas.length).each do |i|
result << result.last + deltas[i]
end
result
end
def self.encode_with_compression(values)
deltas = encode(values)
compressed = Zlib::Deflate.deflate(deltas.pack('l*'))
{
original_size: values.pack('l*').bytesize,
delta_size: deltas.pack('l*').bytesize,
compressed_size: compressed.bytesize,
data: compressed
}
end
end
# Time series data with small incremental changes
measurements = [100, 102, 103, 105, 104, 106, 108, 107]
result = DeltaEncoder.encode_with_compression(measurements)
puts "Original size: #{result[:original_size]} bytes"
puts "Delta size: #{result[:delta_size]} bytes"
puts "Compressed size: #{result[:compressed_size]} bytes"
Performance Considerations
Compression algorithms exhibit varying time complexity characteristics. Run-length encoding operates in O(n) time with a single pass through the input. Huffman coding requires O(n log n) time for building the frequency table and tree. LZ77 variants demonstrate O(n × w) complexity where w represents the window size, though optimized implementations reduce this through efficient search structures.
Memory requirements differ significantly between approaches. Streaming compression processes data in chunks, maintaining constant memory usage regardless of input size. In-memory compression loads the entire dataset, requiring memory proportional to input size plus overhead for compression structures. The choice depends on available resources and data characteristics.
require 'zlib'
require 'benchmark'
def benchmark_compression_levels(data)
results = {}
(0..9).each do |level|
result = Benchmark.measure do
100.times { Zlib::Deflate.deflate(data, level) }
end
compressed = Zlib::Deflate.deflate(data, level)
results[level] = {
time: result.real / 100,
size: compressed.bytesize,
ratio: (compressed.bytesize.to_f / data.bytesize).round(4)
}
end
results
end
test_data = "Sample text with patterns " * 1000
results = benchmark_compression_levels(test_data)
results.each do |level, stats|
puts "Level #{level}: #{(stats[:time] * 1000).round(2)}ms, " +
"#{stats[:size]} bytes, #{stats[:ratio]} ratio"
end
Trade-offs exist between compression ratio and processing speed. Higher compression levels examine more patterns and use larger search windows, increasing both time and memory requirements. Level 1 compression completes quickly with moderate compression, while level 9 achieves maximum compression at the cost of significantly increased processing time.
Decompression typically runs faster than compression since the algorithm follows predetermined instructions rather than searching for patterns. Applications that compress once but decompress frequently should prioritize compression ratio over compression speed, accepting longer initial processing for improved storage and transmission efficiency.
require 'zlib'
require 'benchmark'
class CompressionBenchmark
def self.compare_operations(data)
compressed = Zlib::Deflate.deflate(data, 9)
compression_time = Benchmark.measure do
1000.times { Zlib::Deflate.deflate(data, 9) }
end
decompression_time = Benchmark.measure do
1000.times { Zlib::Inflate.inflate(compressed) }
end
{
compression_ms: (compression_time.real / 1000 * 1000).round(2),
decompression_ms: (decompression_time.real / 1000 * 1000).round(2),
ratio: ((decompression_time.real / compression_time.real)).round(2)
}
end
end
data = File.read('large_file.txt')
results = CompressionBenchmark.compare_operations(data)
puts "Compression: #{results[:compression_ms]}ms per operation"
puts "Decompression: #{results[:decompression_ms]}ms per operation"
puts "Decompression is #{results[:ratio]}x faster"
Chunk size affects streaming compression performance. Smaller chunks increase overhead from repeated function calls and reduce compression effectiveness since the algorithm sees less context. Larger chunks improve compression ratios and reduce overhead but require more memory. Typical chunk sizes range from 8KB to 64KB depending on available memory and performance requirements.
Pre-compression analysis determines whether compression provides benefit. Small files may expand due to metadata overhead. Already-compressed data or random data resist further compression. Checking file extensions or computing entropy helps avoid wasted processing on incompressible data.
class CompressionAdvisor
INCOMPRESSIBLE_EXTENSIONS = %w[.gz .zip .jpg .png .mp4 .mp3 .pdf]
MIN_ENTROPY_THRESHOLD = 7.0
MIN_SIZE_THRESHOLD = 1024
def self.should_compress?(file_path)
return false if File.size(file_path) < MIN_SIZE_THRESHOLD
return false if INCOMPRESSIBLE_EXTENSIONS.any? { |ext| file_path.end_with?(ext) }
sample = File.read(file_path, 8192)
entropy = calculate_entropy(sample)
entropy < MIN_ENTROPY_THRESHOLD
end
def self.calculate_entropy(data)
frequencies = data.bytes.each_with_object(Hash.new(0)) { |byte, hash| hash[byte] += 1 }
length = data.bytesize.to_f
frequencies.values.reduce(0) do |sum, freq|
probability = freq / length
sum - (probability * Math.log2(probability))
end
end
def self.estimate_compression(file_path)
sample_size = [File.size(file_path), 65536].min
sample = File.read(file_path, sample_size)
compressed = Zlib::Deflate.deflate(sample)
estimated_ratio = compressed.bytesize.to_f / sample.bytesize
{
should_compress: should_compress?(file_path),
estimated_ratio: estimated_ratio.round(4),
estimated_savings: ((1 - estimated_ratio) * 100).round(2)
}
end
end
Reference
Compression Algorithm Comparison
| Algorithm | Time Complexity | Space Complexity | Best Use Case | Compression Ratio |
|---|---|---|---|---|
| Run-Length Encoding | O(n) | O(1) | Images, binary data with runs | Low to Medium |
| Huffman Coding | O(n log n) | O(n) | Text with variable symbol frequency | Medium |
| LZ77 | O(n × w) | O(w) | Text with repeated patterns | Medium to High |
| DEFLATE | O(n × w) | O(w) | General purpose compression | High |
| Delta Encoding | O(n) | O(1) | Time series, incremental data | Medium |
Zlib Compression Levels
| Level | Speed | Ratio | Use Case |
|---|---|---|---|
| 0 | Fastest | None | No compression, store only |
| 1 | Very Fast | Low | Real-time streaming |
| 2-3 | Fast | Medium-Low | Network transmission |
| 4-6 | Moderate | Medium | Default, balanced performance |
| 7-8 | Slow | High | Storage optimization |
| 9 | Slowest | Highest | Archival, maximum compression |
Ruby Zlib Methods
| Method | Purpose | Returns |
|---|---|---|
| Zlib::Deflate.deflate | Compress string | Compressed string |
| Zlib::Inflate.inflate | Decompress string | Original string |
| Zlib::GzipWriter.new | Create gzip writer | Writer instance |
| Zlib::GzipReader.new | Create gzip reader | Reader instance |
| Zlib.crc32 | Calculate CRC32 checksum | Integer checksum |
| Zlib.adler32 | Calculate Adler32 checksum | Integer checksum |
Common Compression Scenarios
| Scenario | Recommended Approach | Rationale |
|---|---|---|
| Small text files | DEFLATE level 6 | Balanced speed and ratio |
| Large log files | Streaming DEFLATE level 9 | Maximum compression for storage |
| JSON API responses | DEFLATE level 4-6 | Fast compression for transmission |
| Database text columns | DEFLATE level 9 + Base64 | Maximum storage efficiency |
| Real-time streams | DEFLATE level 1-3 | Minimize latency |
| Archival storage | DEFLATE level 9 | Optimize for space |
Performance Benchmarks
| Data Type | Original Size | Compressed Size | Ratio | Time |
|---|---|---|---|---|
| Plain text | 100 KB | 35 KB | 0.35 | 15 ms |
| JSON data | 100 KB | 25 KB | 0.25 | 18 ms |
| HTML | 100 KB | 20 KB | 0.20 | 16 ms |
| CSV | 100 KB | 30 KB | 0.30 | 17 ms |
| Source code | 100 KB | 28 KB | 0.28 | 19 ms |
Entropy Thresholds
| Entropy Range | Compressibility | Example Data |
|---|---|---|
| 0-2 bits | Excellent | Highly repetitive text |
| 2-4 bits | Good | Natural language text |
| 4-6 bits | Moderate | Mixed content |
| 6-7 bits | Poor | Compressed data |
| 7-8 bits | None | Random or encrypted data |
Error Handling Patterns
| Error Type | Cause | Handling Strategy |
|---|---|---|
| Zlib::DataError | Corrupted compressed data | Validate checksums before decompression |
| Zlib::BufError | Insufficient buffer | Increase buffer size or use streaming |
| Zlib::StreamError | Invalid compression state | Reset deflater/inflater and retry |
| Errno::ENOSPC | Insufficient disk space | Check available space before compression |
| ArgumentError | Invalid compression level | Validate level is 0-9 |