CrackedRuby CrackedRuby

Overview

String compression transforms input data into a smaller representation while preserving the ability to reconstruct the original string. The process identifies redundancy in data and replaces repeated patterns with shorter codes or references. Compression algorithms balance three competing factors: compression ratio, processing speed, and memory requirements.

Two fundamental categories define compression approaches. Lossless compression guarantees perfect reconstruction of the original string, making it suitable for text, source code, and data where accuracy matters. Lossy compression sacrifices some information for higher compression ratios, primarily used for media files where minor quality degradation is acceptable. String compression typically refers to lossless methods since text data requires exact reproduction.

The effectiveness of compression depends on input characteristics. Highly repetitive data compresses well, while random or already-compressed data may resist further reduction. Some compression attempts produce output larger than the input due to metadata overhead.

require 'zlib'

original = "AAAAAABBBBBCCCCCDDDDD"
compressed = Zlib::Deflate.deflate(original)

puts "Original size: #{original.bytesize} bytes"
puts "Compressed size: #{compressed.bytesize} bytes"
puts "Ratio: #{((1 - compressed.bytesize.to_f / original.bytesize) * 100).round(2)}%"
# Original size: 21 bytes
# Compressed size: 19 bytes
# Ratio: 9.52%

Compression finds application across software development domains. Web servers compress HTTP responses to reduce bandwidth consumption. Databases compress stored records to maximize capacity. Version control systems store file deltas in compressed formats. Log aggregation systems compress historical data for cost-effective retention.

Key Principles

Entropy quantifies the information content in data. High-entropy data contains less redundancy and compresses poorly, while low-entropy data with repeated patterns achieves better compression. Claude Shannon's information theory established that no lossless compression algorithm can compress all possible inputs, as this would violate the pigeonhole principle.

Compression ratio measures effectiveness as the size relationship between compressed and original data. A ratio of 0.5 indicates the compressed version occupies half the original size. The formula compression_ratio = compressed_size / original_size produces values between 0 and 1 for successful compression, though values exceeding 1 indicate expansion rather than compression.

Statistical methods form the foundation of many compression algorithms. These approaches analyze symbol frequency in the input and assign shorter codes to frequently occurring symbols. Huffman coding creates an optimal prefix-free code based on symbol frequencies, ensuring no code is a prefix of another code, which eliminates ambiguity during decompoding.

Dictionary-based compression maintains a mapping between patterns and codes. LZ77 and LZ78 algorithms build dictionaries dynamically during compression, identifying repeated sequences and replacing subsequent occurrences with references to earlier positions. This approach works effectively on text with recurring phrases or structural patterns.

Run-length encoding (RLE) represents the simplest compression technique. It replaces consecutive identical symbols with a count-symbol pair. The string "AAAAA" becomes "5A" in RLE notation. This method excels with data containing long runs of identical values but expands data with few repetitions.

def calculate_entropy(string)
  frequencies = string.chars.each_with_object(Hash.new(0)) { |char, hash| hash[char] += 1 }
  length = string.length.to_f
  
  entropy = frequencies.values.reduce(0) do |sum, freq|
    probability = freq / length
    sum - (probability * Math.log2(probability))
  end
  
  entropy.round(4)
end

high_entropy = "a1b2c3d4e5f6g7h8"
low_entropy = "aaaaaabbbbbbcccccc"

puts "High entropy: #{calculate_entropy(high_entropy)} bits per symbol"
puts "Low entropy: #{calculate_entropy(low_entropy)} bits per symbol"
# High entropy: 4.0 bits per symbol
# Low entropy: 1.585 bits per symbol

The compression-decompression cycle must maintain symmetry. Compression encodes data using an algorithm and optional dictionary or model. Decompression reverses the process, applying the inverse algorithm with the same dictionary or model. Asymmetric algorithms may optimize for either compression or decompression speed depending on the use case.

Adaptive algorithms modify their behavior based on processed data. Static algorithms use fixed tables or models established before processing begins. Adaptive methods typically achieve better compression ratios at the cost of increased complexity and processing requirements.

Ruby Implementation

Ruby provides built-in compression through the Zlib library, which implements the DEFLATE algorithm combining LZ77 and Huffman coding. The library offers both streaming and single-shot compression interfaces suitable for different scenarios.

The Zlib::Deflate class handles compression operations. The deflate method compresses a complete string in one operation, suitable for small to medium-sized data where memory consumption is not a concern. The method accepts an optional compression level from 0 (no compression) to 9 (maximum compression), defaulting to Zlib::DEFAULT_COMPRESSION.

require 'zlib'

class CompressionService
  def self.compress(data, level: Zlib::DEFAULT_COMPRESSION)
    Zlib::Deflate.deflate(data, level)
  end
  
  def self.decompress(data)
    Zlib::Inflate.inflate(data)
  end
  
  def self.compress_with_stats(data, level: Zlib::DEFAULT_COMPRESSION)
    compressed = compress(data, level: level)
    {
      original_size: data.bytesize,
      compressed_size: compressed.bytesize,
      ratio: (compressed.bytesize.to_f / data.bytesize).round(4),
      savings: ((1 - compressed.bytesize.to_f / data.bytesize) * 100).round(2),
      data: compressed
    }
  end
end

text = File.read('large_log.txt')
result = CompressionService.compress_with_stats(text, level: 9)
puts "Savings: #{result[:savings]}%"

Streaming compression handles large datasets that exceed available memory. The Zlib::Deflate instance processes data in chunks, maintaining internal state between calls. This approach enables compression of files, network streams, or generated data without loading everything into memory simultaneously.

require 'zlib'

def compress_file_streaming(input_path, output_path)
  File.open(output_path, 'wb') do |output|
    deflater = Zlib::Deflate.new
    
    File.open(input_path, 'rb') do |input|
      while chunk = input.read(8192)
        compressed = deflater.deflate(chunk, Zlib::NO_FLUSH)
        output.write(compressed) unless compressed.empty?
      end
    end
    
    # Flush remaining data
    output.write(deflater.finish)
    deflater.close
  end
end

def decompress_file_streaming(input_path, output_path)
  File.open(output_path, 'wb') do |output|
    inflater = Zlib::Inflate.new
    
    File.open(input_path, 'rb') do |input|
      while chunk = input.read(8192)
        decompressed = inflater.inflate(chunk)
        output.write(decompressed) unless decompressed.empty?
      end
    end
    
    inflater.close
  end
end

Custom compression algorithms implement domain-specific requirements. Run-length encoding provides a simple example demonstrating the core concepts of pattern recognition and encoding.

class RunLengthEncoder
  def self.encode(string)
    return "" if string.empty?
    
    result = []
    current_char = string[0]
    count = 1
    
    string.chars.each_with_index do |char, index|
      next if index.zero?
      
      if char == current_char
        count += 1
      else
        result << "#{count}#{current_char}"
        current_char = char
        count = 1
      end
    end
    
    result << "#{count}#{current_char}"
    result.join
  end
  
  def self.decode(encoded)
    encoded.scan(/(\d+)(.)/).map { |count, char| char * count.to_i }.join
  end
  
  def self.should_compress?(string)
    encoded = encode(string)
    encoded.bytesize < string.bytesize
  end
end

data = "AAAAAABBBBBCCCCCDDDDD"
encoded = RunLengthEncoder.encode(data)
puts "Original: #{data}"
puts "Encoded: #{encoded}"
puts "Decoded: #{RunLengthEncoder.decode(encoded)}"
# Original: AAAAAABBBBBCCCCCDDDDD
# Encoded: 6A5B5C5D
# Decoded: AAAAAABBBBBCCCCCDDDDD

The StringIO class enables in-memory compression when working with string data rather than files. This approach avoids filesystem overhead and integrates cleanly with existing string-processing code.

require 'zlib'
require 'stringio'

def compress_string_io(data)
  io = StringIO.new
  writer = Zlib::GzipWriter.new(io)
  writer.write(data)
  writer.close
  io.string
end

def decompress_string_io(compressed)
  io = StringIO.new(compressed)
  reader = Zlib::GzipReader.new(io)
  decompressed = reader.read
  reader.close
  decompressed
end

original = "Sample text " * 100
compressed = compress_string_io(original)
restored = decompress_string_io(compressed)

puts original == restored  # true
puts "Size reduction: #{((1 - compressed.bytesize.to_f / original.bytesize) * 100).round(2)}%"

Practical Examples

Text file compression demonstrates typical use cases where repeated words and structural patterns enable significant size reduction. Configuration files, source code, and documentation benefit from compression during storage and transmission.

require 'zlib'

class TextFileCompressor
  attr_reader :stats
  
  def initialize
    @stats = {}
  end
  
  def compress_file(input_path, output_path, level: 9)
    original_size = File.size(input_path)
    
    Zlib::GzipWriter.open(output_path, level: level) do |gz|
      File.open(input_path, 'rb') do |file|
        while chunk = file.read(16384)
          gz.write(chunk)
        end
      end
    end
    
    compressed_size = File.size(output_path)
    
    @stats = {
      original_size: original_size,
      compressed_size: compressed_size,
      ratio: (compressed_size.to_f / original_size).round(4),
      reduction_percent: ((1 - compressed_size.to_f / original_size) * 100).round(2)
    }
  end
  
  def decompress_file(input_path, output_path)
    Zlib::GzipReader.open(input_path) do |gz|
      File.open(output_path, 'wb') do |file|
        while chunk = gz.read(16384)
          file.write(chunk)
        end
      end
    end
  end
  
  def verify_integrity(original_path, decompressed_path)
    original_hash = Digest::SHA256.file(original_path).hexdigest
    decompressed_hash = Digest::SHA256.file(decompressed_path).hexdigest
    original_hash == decompressed_hash
  end
end

compressor = TextFileCompressor.new
compressor.compress_file('application.log', 'application.log.gz', level: 9)
puts "Compression: #{compressor.stats[:reduction_percent]}% size reduction"

compressor.decompress_file('application.log.gz', 'application_restored.log')
puts "Integrity: #{compressor.verify_integrity('application.log', 'application_restored.log')}"

JSON compression addresses the verbosity inherent in JSON format. The structured nature with repeated keys and hierarchical nesting provides opportunities for compression, particularly with arrays of similar objects.

require 'zlib'
require 'json'

class JsonCompressor
  def self.compress(hash, level: 9)
    json_string = JSON.generate(hash)
    Zlib::Deflate.deflate(json_string, level)
  end
  
  def self.decompress(compressed_data)
    json_string = Zlib::Inflate.inflate(compressed_data)
    JSON.parse(json_string)
  end
  
  def self.compare_formats(data)
    json_string = JSON.generate(data)
    compressed = compress(data)
    
    {
      json_size: json_string.bytesize,
      compressed_size: compressed.bytesize,
      reduction: ((1 - compressed.bytesize.to_f / json_string.bytesize) * 100).round(2)
    }
  end
end

# Large dataset with repetitive structure
users = 1000.times.map do |i|
  {
    id: i,
    username: "user_#{i}",
    email: "user#{i}@example.com",
    status: "active",
    role: "member",
    created_at: "2024-01-01T00:00:00Z"
  }
end

stats = JsonCompressor.compare_formats(users)
puts "JSON size: #{stats[:json_size]} bytes"
puts "Compressed size: #{stats[:compressed_size]} bytes"
puts "Reduction: #{stats[:reduction]}%"

HTTP response compression reduces bandwidth and improves response times for web applications. Rack middleware handles compression transparently for supported clients.

require 'zlib'
require 'rack'

class CompressionMiddleware
  COMPRESSIBLE_TYPES = [
    'text/html', 'text/plain', 'text/css', 'text/javascript',
    'application/json', 'application/xml'
  ].freeze
  
  MIN_SIZE = 1024  # Don't compress responses smaller than 1KB
  
  def initialize(app)
    @app = app
  end
  
  def call(env)
    status, headers, body = @app.call(env)
    
    return [status, headers, body] unless should_compress?(env, headers, body)
    
    compressed_body = compress_body(body)
    headers['Content-Encoding'] = 'gzip'
    headers['Content-Length'] = compressed_body.bytesize.to_s
    headers.delete('Content-MD5')
    
    [status, headers, [compressed_body]]
  end
  
  private
  
  def should_compress?(env, headers, body)
    return false unless env['HTTP_ACCEPT_ENCODING']&.include?('gzip')
    return false if headers['Content-Encoding']
    
    content_type = headers['Content-Type']&.split(';')&.first
    return false unless COMPRESSIBLE_TYPES.include?(content_type)
    
    body_size = body.respond_to?(:bytesize) ? body.bytesize : body.join.bytesize
    body_size >= MIN_SIZE
  end
  
  def compress_body(body)
    io = StringIO.new
    gz = Zlib::GzipWriter.new(io)
    
    if body.respond_to?(:each)
      body.each { |chunk| gz.write(chunk) }
    else
      gz.write(body)
    end
    
    gz.close
    io.string
  end
end

Database column compression reduces storage requirements for text-heavy fields. This pattern applies to log entries, comments, descriptions, or any large text data stored in databases.

require 'zlib'
require 'base64'

module CompressibleAttribute
  def self.included(base)
    base.extend(ClassMethods)
  end
  
  module ClassMethods
    def compresses(*attributes)
      attributes.each do |attribute|
        define_method(attribute) do
          compressed = super()
          return nil if compressed.nil?
          
          decoded = Base64.strict_decode64(compressed)
          Zlib::Inflate.inflate(decoded)
        end
        
        define_method("#{attribute}=") do |value|
          return super(nil) if value.nil?
          
          compressed = Zlib::Deflate.deflate(value, 9)
          encoded = Base64.strict_encode64(compressed)
          super(encoded)
        end
      end
    end
  end
end

class Article
  include CompressibleAttribute
  
  attr_accessor :content_compressed
  
  compresses :content
  
  alias_method :content_compressed, :content_compressed
  alias_method :content_compressed=, :content_compressed=
end

article = Article.new
article.content = "Long article content " * 500
puts "Stored size: #{article.content_compressed.bytesize} bytes"
puts "Original size: #{article.content.bytesize} bytes"

Common Patterns

Run-length encoding suits data with long sequences of identical values. Image formats use RLE for simple graphics, and fax machines employ it for document transmission. The algorithm scans the input sequentially, counting consecutive occurrences of each symbol.

class AdvancedRLE
  def self.encode(string)
    return "" if string.empty?
    
    result = []
    chars = string.chars
    i = 0
    
    while i < chars.length
      char = chars[i]
      count = 1
      
      # Count consecutive occurrences
      while i + count < chars.length && chars[i + count] == char
        count += 1
      end
      
      # Use count prefix only if beneficial
      if count > 2
        result << "#{count}#{char}"
      elsif count == 2
        result << char << char
      else
        result << char
      end
      
      i += count
    end
    
    result.join
  end
  
  def self.encode_binary(string)
    # More efficient for binary/compressed data
    result = []
    i = 0
    
    while i < string.length
      char = string[i]
      count = 1
      
      while i + count < string.length && 
            string[i + count] == char && 
            count < 255
        count += 1
      end
      
      result << [count, char.ord].pack('CC')
      i += count
    end
    
    result.join
  end
end

Huffman coding assigns variable-length codes based on symbol frequency. Frequent symbols receive shorter codes, while rare symbols get longer codes. This statistical approach achieves optimal prefix-free encoding for a given set of symbol probabilities.

class HuffmanNode
  attr_accessor :char, :freq, :left, :right
  
  def initialize(char, freq, left = nil, right = nil)
    @char = char
    @freq = freq
    @left = left
    @right = right
  end
  
  def leaf?
    @left.nil? && @right.nil?
  end
end

class HuffmanEncoder
  def initialize(string)
    @string = string
    @codes = {}
    build_tree
    generate_codes(@root, "")
  end
  
  def encode
    @string.chars.map { |char| @codes[char] }.join
  end
  
  def code_table
    @codes
  end
  
  private
  
  def build_tree
    frequencies = @string.chars.each_with_object(Hash.new(0)) { |c, h| h[c] += 1 }
    nodes = frequencies.map { |char, freq| HuffmanNode.new(char, freq) }
    
    while nodes.length > 1
      nodes.sort_by!(&:freq)
      left = nodes.shift
      right = nodes.shift
      parent = HuffmanNode.new(nil, left.freq + right.freq, left, right)
      nodes << parent
    end
    
    @root = nodes.first
  end
  
  def generate_codes(node, code)
    return if node.nil?
    
    if node.leaf?
      @codes[node.char] = code.empty? ? "0" : code
      return
    end
    
    generate_codes(node.left, code + "0")
    generate_codes(node.right, code + "1")
  end
end

text = "BCAADDDCCACACAC"
encoder = HuffmanEncoder.new(text)
encoded = encoder.encode

puts "Original: #{text} (#{text.length} chars, #{text.length * 8} bits)"
puts "Code table: #{encoder.code_table}"
puts "Encoded: #{encoded} (#{encoded.length} bits)"
puts "Compression: #{((1 - encoded.length.to_f / (text.length * 8)) * 100).round(2)}%"

Dictionary-based compression maintains a mapping between patterns and shorter references. LZ77 uses a sliding window to find repeated sequences, encoding them as (distance, length) pairs. This approach works effectively on text with recurring phrases.

class SimpleLZ77
  WINDOW_SIZE = 4096
  LOOKAHEAD_SIZE = 18
  
  def self.encode(string)
    result = []
    pos = 0
    
    while pos < string.length
      match = find_longest_match(string, pos)
      
      if match[:length] > 2
        result << {
          type: :reference,
          distance: match[:distance],
          length: match[:length],
          next_char: string[pos + match[:length]]
        }
        pos += match[:length] + 1
      else
        result << { type: :literal, char: string[pos] }
        pos += 1
      end
    end
    
    result
  end
  
  def self.decode(encoded)
    output = ""
    
    encoded.each do |token|
      if token[:type] == :literal
        output << token[:char]
      else
        start = output.length - token[:distance]
        token[:length].times do |i|
          output << output[start + i]
        end
        output << token[:next_char] if token[:next_char]
      end
    end
    
    output
  end
  
  private
  
  def self.find_longest_match(string, pos)
    window_start = [0, pos - WINDOW_SIZE].max
    best_match = { distance: 0, length: 0 }
    
    (window_start...pos).each do |i|
      length = 0
      
      while length < LOOKAHEAD_SIZE &&
            pos + length < string.length &&
            string[i + length] == string[pos + length]
        length += 1
      end
      
      if length > best_match[:length]
        best_match = { distance: pos - i, length: length }
      end
    end
    
    best_match
  end
end

text = "ABCABCABCABCXYZ"
encoded = SimpleLZ77.encode(text)
decoded = SimpleLZ77.decode(encoded)

puts "Original: #{text}"
puts "Tokens: #{encoded.length}"
puts "Decoded: #{decoded}"
puts "Match: #{text == decoded}"

Delta encoding stores differences between consecutive values rather than absolute values. This pattern suits time-series data, incremental backups, or any sequence where values change gradually.

class DeltaEncoder
  def self.encode(values)
    return [] if values.empty?
    
    result = [values.first]
    
    (1...values.length).each do |i|
      delta = values[i] - values[i - 1]
      result << delta
    end
    
    result
  end
  
  def self.decode(deltas)
    return [] if deltas.empty?
    
    result = [deltas.first]
    
    (1...deltas.length).each do |i|
      result << result.last + deltas[i]
    end
    
    result
  end
  
  def self.encode_with_compression(values)
    deltas = encode(values)
    compressed = Zlib::Deflate.deflate(deltas.pack('l*'))
    
    {
      original_size: values.pack('l*').bytesize,
      delta_size: deltas.pack('l*').bytesize,
      compressed_size: compressed.bytesize,
      data: compressed
    }
  end
end

# Time series data with small incremental changes
measurements = [100, 102, 103, 105, 104, 106, 108, 107]
result = DeltaEncoder.encode_with_compression(measurements)

puts "Original size: #{result[:original_size]} bytes"
puts "Delta size: #{result[:delta_size]} bytes"
puts "Compressed size: #{result[:compressed_size]} bytes"

Performance Considerations

Compression algorithms exhibit varying time complexity characteristics. Run-length encoding operates in O(n) time with a single pass through the input. Huffman coding requires O(n log n) time for building the frequency table and tree. LZ77 variants demonstrate O(n × w) complexity where w represents the window size, though optimized implementations reduce this through efficient search structures.

Memory requirements differ significantly between approaches. Streaming compression processes data in chunks, maintaining constant memory usage regardless of input size. In-memory compression loads the entire dataset, requiring memory proportional to input size plus overhead for compression structures. The choice depends on available resources and data characteristics.

require 'zlib'
require 'benchmark'

def benchmark_compression_levels(data)
  results = {}
  
  (0..9).each do |level|
    result = Benchmark.measure do
      100.times { Zlib::Deflate.deflate(data, level) }
    end
    
    compressed = Zlib::Deflate.deflate(data, level)
    
    results[level] = {
      time: result.real / 100,
      size: compressed.bytesize,
      ratio: (compressed.bytesize.to_f / data.bytesize).round(4)
    }
  end
  
  results
end

test_data = "Sample text with patterns " * 1000
results = benchmark_compression_levels(test_data)

results.each do |level, stats|
  puts "Level #{level}: #{(stats[:time] * 1000).round(2)}ms, " +
       "#{stats[:size]} bytes, #{stats[:ratio]} ratio"
end

Trade-offs exist between compression ratio and processing speed. Higher compression levels examine more patterns and use larger search windows, increasing both time and memory requirements. Level 1 compression completes quickly with moderate compression, while level 9 achieves maximum compression at the cost of significantly increased processing time.

Decompression typically runs faster than compression since the algorithm follows predetermined instructions rather than searching for patterns. Applications that compress once but decompress frequently should prioritize compression ratio over compression speed, accepting longer initial processing for improved storage and transmission efficiency.

require 'zlib'
require 'benchmark'

class CompressionBenchmark
  def self.compare_operations(data)
    compressed = Zlib::Deflate.deflate(data, 9)
    
    compression_time = Benchmark.measure do
      1000.times { Zlib::Deflate.deflate(data, 9) }
    end
    
    decompression_time = Benchmark.measure do
      1000.times { Zlib::Inflate.inflate(compressed) }
    end
    
    {
      compression_ms: (compression_time.real / 1000 * 1000).round(2),
      decompression_ms: (decompression_time.real / 1000 * 1000).round(2),
      ratio: ((decompression_time.real / compression_time.real)).round(2)
    }
  end
end

data = File.read('large_file.txt')
results = CompressionBenchmark.compare_operations(data)

puts "Compression: #{results[:compression_ms]}ms per operation"
puts "Decompression: #{results[:decompression_ms]}ms per operation"
puts "Decompression is #{results[:ratio]}x faster"

Chunk size affects streaming compression performance. Smaller chunks increase overhead from repeated function calls and reduce compression effectiveness since the algorithm sees less context. Larger chunks improve compression ratios and reduce overhead but require more memory. Typical chunk sizes range from 8KB to 64KB depending on available memory and performance requirements.

Pre-compression analysis determines whether compression provides benefit. Small files may expand due to metadata overhead. Already-compressed data or random data resist further compression. Checking file extensions or computing entropy helps avoid wasted processing on incompressible data.

class CompressionAdvisor
  INCOMPRESSIBLE_EXTENSIONS = %w[.gz .zip .jpg .png .mp4 .mp3 .pdf]
  MIN_ENTROPY_THRESHOLD = 7.0
  MIN_SIZE_THRESHOLD = 1024
  
  def self.should_compress?(file_path)
    return false if File.size(file_path) < MIN_SIZE_THRESHOLD
    return false if INCOMPRESSIBLE_EXTENSIONS.any? { |ext| file_path.end_with?(ext) }
    
    sample = File.read(file_path, 8192)
    entropy = calculate_entropy(sample)
    
    entropy < MIN_ENTROPY_THRESHOLD
  end
  
  def self.calculate_entropy(data)
    frequencies = data.bytes.each_with_object(Hash.new(0)) { |byte, hash| hash[byte] += 1 }
    length = data.bytesize.to_f
    
    frequencies.values.reduce(0) do |sum, freq|
      probability = freq / length
      sum - (probability * Math.log2(probability))
    end
  end
  
  def self.estimate_compression(file_path)
    sample_size = [File.size(file_path), 65536].min
    sample = File.read(file_path, sample_size)
    
    compressed = Zlib::Deflate.deflate(sample)
    estimated_ratio = compressed.bytesize.to_f / sample.bytesize
    
    {
      should_compress: should_compress?(file_path),
      estimated_ratio: estimated_ratio.round(4),
      estimated_savings: ((1 - estimated_ratio) * 100).round(2)
    }
  end
end

Reference

Compression Algorithm Comparison

Algorithm Time Complexity Space Complexity Best Use Case Compression Ratio
Run-Length Encoding O(n) O(1) Images, binary data with runs Low to Medium
Huffman Coding O(n log n) O(n) Text with variable symbol frequency Medium
LZ77 O(n × w) O(w) Text with repeated patterns Medium to High
DEFLATE O(n × w) O(w) General purpose compression High
Delta Encoding O(n) O(1) Time series, incremental data Medium

Zlib Compression Levels

Level Speed Ratio Use Case
0 Fastest None No compression, store only
1 Very Fast Low Real-time streaming
2-3 Fast Medium-Low Network transmission
4-6 Moderate Medium Default, balanced performance
7-8 Slow High Storage optimization
9 Slowest Highest Archival, maximum compression

Ruby Zlib Methods

Method Purpose Returns
Zlib::Deflate.deflate Compress string Compressed string
Zlib::Inflate.inflate Decompress string Original string
Zlib::GzipWriter.new Create gzip writer Writer instance
Zlib::GzipReader.new Create gzip reader Reader instance
Zlib.crc32 Calculate CRC32 checksum Integer checksum
Zlib.adler32 Calculate Adler32 checksum Integer checksum

Common Compression Scenarios

Scenario Recommended Approach Rationale
Small text files DEFLATE level 6 Balanced speed and ratio
Large log files Streaming DEFLATE level 9 Maximum compression for storage
JSON API responses DEFLATE level 4-6 Fast compression for transmission
Database text columns DEFLATE level 9 + Base64 Maximum storage efficiency
Real-time streams DEFLATE level 1-3 Minimize latency
Archival storage DEFLATE level 9 Optimize for space

Performance Benchmarks

Data Type Original Size Compressed Size Ratio Time
Plain text 100 KB 35 KB 0.35 15 ms
JSON data 100 KB 25 KB 0.25 18 ms
HTML 100 KB 20 KB 0.20 16 ms
CSV 100 KB 30 KB 0.30 17 ms
Source code 100 KB 28 KB 0.28 19 ms

Entropy Thresholds

Entropy Range Compressibility Example Data
0-2 bits Excellent Highly repetitive text
2-4 bits Good Natural language text
4-6 bits Moderate Mixed content
6-7 bits Poor Compressed data
7-8 bits None Random or encrypted data

Error Handling Patterns

Error Type Cause Handling Strategy
Zlib::DataError Corrupted compressed data Validate checksums before decompression
Zlib::BufError Insufficient buffer Increase buffer size or use streaming
Zlib::StreamError Invalid compression state Reset deflater/inflater and retry
Errno::ENOSPC Insufficient disk space Check available space before compression
ArgumentError Invalid compression level Validate level is 0-9