Overview
Ruby handles tar archives primarily through the archive-tar-minitar
gem, which provides pure Ruby implementation for reading and writing POSIX tar archive files. The library offers both high-level convenience methods and low-level streaming interfaces for processing tar archives without loading entire archives into memory.
The core classes include Archive::Tar::Minitar
for high-level operations, Archive::Tar::Minitar::Reader
for streaming reads, and Archive::Tar::Minitar::Writer
for streaming writes. These classes handle standard tar format compliance, including file metadata preservation, directory structures, and symbolic links.
require 'archive/tar/minitar'
require 'zlib'
# Create a compressed tar archive
File.open('archive.tar.gz', 'wb') do |file|
Zlib::GzipWriter.wrap(file) do |gzip|
Archive::Tar::Minitar.pack(['file1.txt', 'dir/'], gzip)
end
end
The library integrates with Ruby's compression libraries like zlib
for gzip compression and bzip2-ffi
for bzip2 compression. File permissions, timestamps, and ownership information are preserved during archive operations when the underlying filesystem supports these attributes.
# Extract with preserved metadata
File.open('archive.tar.gz', 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gzip|
Archive::Tar::Minitar.unpack(gzip, 'extracted/')
end
end
The minitar implementation handles both GNU tar and POSIX tar formats, with automatic format detection during reading operations. Archive entries can be files, directories, symbolic links, or special device files, each maintaining their original attributes and content.
Basic Usage
Creating tar archives requires specifying source files or directories as an array of paths. The Archive::Tar::Minitar.pack
method recursively includes directory contents and maintains the relative path structure within the archive.
require 'archive/tar/minitar'
# Create uncompressed tar archive
File.open('backup.tar', 'wb') do |file|
Archive::Tar::Minitar.pack(['app/', 'config.yml', 'README.md'], file)
end
The pack method accepts various path types including files, directories, and glob patterns. Directory inclusion follows symbolic links by default, though this behavior can be controlled through options.
# Create archive with multiple source types
sources = [
'src/', # Directory (recursive)
'main.rb', # Single file
'docs/*.md' # Glob pattern
]
File.open('project.tar', 'wb') do |tar_file|
expanded_sources = sources.flat_map { |src| Dir.glob(src) }.uniq
Archive::Tar::Minitar.pack(expanded_sources, tar_file)
end
Extracting archives uses the Archive::Tar::Minitar.unpack
method, which reads the tar stream and recreates the file structure in the specified destination directory. The method preserves file permissions and timestamps when possible.
# Extract tar archive to specific directory
File.open('backup.tar', 'rb') do |file|
Archive::Tar::Minitar.unpack(file, 'restore/')
end
Combining with compression requires wrapping the file stream with appropriate compression classes. The pattern remains consistent across different compression formats.
# Create gzip-compressed tar
File.open('archive.tar.gz', 'wb') do |file|
Zlib::GzipWriter.wrap(file) do |gzip|
Archive::Tar::Minitar.pack(['data/'], gzip)
end
end
# Extract gzip-compressed tar
File.open('archive.tar.gz', 'rb') do |file|
Zlib::GzipReader.wrap(file) do |gzip|
Archive::Tar::Minitar.unpack(gzip, 'extracted/')
end
end
The library handles path normalization automatically, converting backslashes to forward slashes and removing redundant path separators. Absolute paths are converted to relative paths to prevent extraction outside the intended directory.
Advanced Usage
Streaming operations provide memory-efficient processing for large archives through the Reader
and Writer
classes. These classes process archive entries individually without loading the entire archive into memory.
require 'archive/tar/minitar'
# Stream-based archive creation
File.open('streaming.tar', 'wb') do |file|
Archive::Tar::Minitar::Writer.open(file) do |writer|
# Add files individually with custom metadata
Dir.glob('source/**/*').each do |path|
next if File.directory?(path)
stat = File.stat(path)
writer.add_file_simple(path, stat.mode, stat.size) do |entry|
File.open(path, 'rb') { |src| entry.write(src.read) }
end
end
# Add in-memory content
content = "Generated at #{Time.now}"
writer.add_file_simple('timestamp.txt', 0644, content.bytesize) do |entry|
entry.write(content)
end
end
end
Custom filtering during extraction allows selective restoration and path manipulation. The reader provides access to individual entry metadata before extraction decisions.
# Selective extraction with custom filtering
File.open('archive.tar', 'rb') do |file|
Archive::Tar::Minitar::Reader.open(file) do |reader|
reader.each do |entry|
# Skip hidden files and certain extensions
next if entry.name.start_with?('.')
next if entry.name.end_with?('.tmp', '.log')
# Modify extraction path
extract_path = entry.name.gsub(/^old_prefix\//, 'new_prefix/')
if entry.directory?
FileUtils.mkdir_p(extract_path)
else
FileUtils.mkdir_p(File.dirname(extract_path))
File.open(extract_path, 'wb') do |output|
output.write(entry.read)
end
File.chmod(entry.mode, extract_path) if entry.mode
end
end
end
end
Archive inspection and metadata extraction enables analysis without full extraction. Each entry provides comprehensive information about the archived content.
# Archive analysis and content inspection
def analyze_tar(tar_path)
entries = []
total_size = 0
File.open(tar_path, 'rb') do |file|
Archive::Tar::Minitar::Reader.open(file) do |reader|
reader.each do |entry|
entries << {
name: entry.name,
size: entry.size,
mode: entry.mode,
mtime: entry.mtime,
type: case entry.typeflag
when '0', "\0" then :file
when '5' then :directory
when '2' then :symlink
else :other
end
}
total_size += entry.size
end
end
end
{ entries: entries, total_size: total_size, count: entries.length }
end
Multi-volume archive handling requires coordinating multiple tar files for archives exceeding size limits. The approach involves splitting at file boundaries to maintain archive integrity.
# Create multi-volume archives with size limits
class MultiVolumeCreator
def initialize(base_name, volume_size_mb = 100)
@base_name = base_name
@volume_size = volume_size_mb * 1024 * 1024
@current_volume = 1
@current_size = 0
@writer = nil
end
def add_file(path)
file_size = File.size(path)
# Start new volume if current would exceed limit
if @writer && (@current_size + file_size > @volume_size)
@writer.close
@writer = nil
@current_volume += 1
@current_size = 0
end
# Open new volume if needed
unless @writer
volume_path = "#{@base_name}.vol#{@current_volume}.tar"
@writer = Archive::Tar::Minitar::Writer.open(File.open(volume_path, 'wb'))
end
@writer.add_file_simple(path, File.stat(path).mode, file_size) do |entry|
File.open(path, 'rb') { |src| entry.write(src.read) }
end
@current_size += file_size
end
def close
@writer&.close
end
end
Error Handling & Debugging
Tar operations encounter various error conditions including missing files, permission issues, corrupted archives, and filesystem limitations. Proper error handling requires catching specific exceptions and providing meaningful recovery paths.
require 'archive/tar/minitar'
def safe_create_archive(sources, output_path)
begin
File.open(output_path, 'wb') do |file|
Archive::Tar::Minitar.pack(sources, file)
end
rescue Errno::ENOENT => e
# Handle missing source files
missing_file = e.message.match(/No such file or directory - (.+)/)[1]
raise "Source file not found: #{missing_file}"
rescue Errno::EACCES => e
# Handle permission errors
raise "Permission denied: #{e.message}"
rescue SystemCallError => e
# Handle other filesystem errors
raise "Filesystem error: #{e.message}"
rescue StandardError => e
# Clean up partial archive on unexpected errors
File.unlink(output_path) if File.exist?(output_path)
raise "Archive creation failed: #{e.message}"
end
end
Archive validation before extraction prevents security issues and corrupted data problems. Validation includes format verification, path traversal detection, and size limit enforcement.
def validate_and_extract(archive_path, extract_to, max_size: 100_000_000)
total_size = 0
entries = []
# First pass: validate archive structure
File.open(archive_path, 'rb') do |file|
Archive::Tar::Minitar::Reader.open(file) do |reader|
reader.each do |entry|
# Check for directory traversal attacks
normalized = File.expand_path(entry.name, extract_to)
unless normalized.start_with?(File.expand_path(extract_to))
raise "Security violation: path traversal detected in #{entry.name}"
end
# Check size limits
total_size += entry.size
if total_size > max_size
raise "Archive too large: exceeds #{max_size} bytes"
end
entries << entry.name
end
end
end
# Second pass: extract validated archive
File.open(archive_path, 'rb') do |file|
Archive::Tar::Minitar.unpack(file, extract_to)
end
entries
rescue Archive::Tar::Minitar::InvalidTarStream => e
raise "Corrupted archive: #{e.message}"
rescue StandardError => e
# Clean up partial extraction
FileUtils.rm_rf(extract_to) if File.exist?(extract_to)
raise "Extraction failed: #{e.message}"
end
Streaming error recovery allows processing to continue when individual entries fail, collecting errors for later analysis while preserving successful operations.
class RobustExtractor
def initialize(error_handler: :collect)
@errors = []
@extracted_files = []
@error_handler = error_handler
end
def extract_with_recovery(archive_path, extract_to)
File.open(archive_path, 'rb') do |file|
Archive::Tar::Minitar::Reader.open(file) do |reader|
reader.each do |entry|
begin
extract_entry(entry, extract_to)
@extracted_files << entry.name
rescue StandardError => e
error = { file: entry.name, error: e.message }
@errors << error
case @error_handler
when :raise_first
raise "Failed to extract #{entry.name}: #{e.message}"
when :warn
warn "Warning: Failed to extract #{entry.name}: #{e.message}"
when :collect
# Continue processing, collect errors
next
end
end
end
end
end
{ extracted: @extracted_files, errors: @errors }
end
private
def extract_entry(entry, base_path)
full_path = File.join(base_path, entry.name)
if entry.directory?
FileUtils.mkdir_p(full_path)
else
FileUtils.mkdir_p(File.dirname(full_path))
File.open(full_path, 'wb') do |output|
output.write(entry.read)
end
File.chmod(entry.mode, full_path) if entry.mode
end
end
end
Performance & Memory
Memory usage optimization requires streaming approaches for large archives, avoiding loading entire contents into memory simultaneously. The choice between convenience methods and streaming classes significantly impacts memory consumption.
require 'benchmark'
require 'archive/tar/minitar'
# Memory-efficient streaming vs. convenience method comparison
def benchmark_approaches(large_files)
puts "Creating archive with #{large_files.length} files"
# Memory-intensive approach (loads all files)
memory_intensive = Benchmark.measure do
File.open('memory_intensive.tar', 'wb') do |file|
Archive::Tar::Minitar.pack(large_files, file)
end
end
# Memory-efficient streaming approach
memory_efficient = Benchmark.measure do
File.open('memory_efficient.tar', 'wb') do |file|
Archive::Tar::Minitar::Writer.open(file) do |writer|
large_files.each do |path|
next unless File.file?(path)
File.open(path, 'rb') do |input|
writer.add_file_simple(path, File.stat(path).mode, File.size(path)) do |entry|
while chunk = input.read(8192)
entry.write(chunk)
end
end
end
end
end
end
end
puts "Memory intensive: #{memory_intensive}"
puts "Memory efficient: #{memory_efficient}"
end
Compression level optimization balances file size reduction with processing time. Different compression algorithms provide varying trade-offs between compression ratio and speed.
# Compare compression methods and levels
def compression_benchmark(source_files)
results = {}
# Uncompressed baseline
results[:uncompressed] = benchmark_compression(source_files, 'baseline.tar') do |file|
Archive::Tar::Minitar.pack(source_files, file)
end
# Gzip compression levels
(1..9).each do |level|
results["gzip_#{level}".to_sym] = benchmark_compression(source_files, "gzip_#{level}.tar.gz") do |file|
Zlib::GzipWriter.wrap(file, level) do |gzip|
Archive::Tar::Minitar.pack(source_files, gzip)
end
end
end
# Bzip2 compression (requires bzip2-ffi gem)
if defined?(Bzip2::FFI)
results[:bzip2] = benchmark_compression(source_files, 'bzip2.tar.bz2') do |file|
Bzip2::FFI::Writer.wrap(file) do |bzip2|
Archive::Tar::Minitar.pack(source_files, bzip2)
end
end
end
results
end
def benchmark_compression(source_files, output_path)
start_time = Time.now
File.open(output_path, 'wb') do |file|
yield file
end
end_time = Time.now
file_size = File.size(output_path)
{
time: end_time - start_time,
size: file_size,
path: output_path
}
ensure
File.unlink(output_path) if File.exist?(output_path)
end
Parallel processing for multiple archives can improve throughput when creating or extracting multiple independent archives simultaneously.
require 'parallel'
# Parallel archive creation
def create_archives_parallel(source_groups, max_threads: 4)
Parallel.each(source_groups, in_threads: max_threads) do |name, sources|
output_path = "#{name}.tar.gz"
File.open(output_path, 'wb') do |file|
Zlib::GzipWriter.wrap(file) do |gzip|
Archive::Tar::Minitar.pack(sources, gzip)
end
end
puts "Created #{output_path} (#{File.size(output_path)} bytes)"
end
end
# Usage
archive_groups = {
'logs' => Dir.glob('logs/**/*.log'),
'docs' => Dir.glob('docs/**/*.{md,txt}'),
'configs' => Dir.glob('config/**/*.{yml,json}')
}
create_archives_parallel(archive_groups)
Common Pitfalls
Path handling inconsistencies between operating systems cause archive portability issues. Windows path separators, case sensitivity differences, and path length limitations affect cross-platform archive compatibility.
# Problematic path handling
def problematic_archive_creation
# This creates platform-specific paths in archive
sources = Dir.glob('C:\\Users\\*\\Documents\\*.txt') # Windows-specific
Archive::Tar::Minitar.pack(sources, file) # Creates non-portable archive
end
# Correct cross-platform path normalization
def portable_archive_creation(base_dir, patterns)
sources = patterns.flat_map { |pattern| Dir.glob(File.join(base_dir, pattern)) }
# Normalize paths for archive portability
normalized_sources = sources.map do |path|
# Convert to relative path with forward slashes
relative_path = Pathname.new(path).relative_path_from(Pathname.new(base_dir))
relative_path.to_s.gsub('\\', '/')
end
File.open('portable.tar', 'wb') do |file|
Archive::Tar::Minitar.pack(normalized_sources, file)
end
end
Permission preservation failures occur when extracting archives across different filesystems or when running with insufficient privileges. The extraction process silently ignores permission errors in many cases.
# Detect and handle permission preservation issues
def extract_with_permission_tracking(archive_path, extract_to)
permission_failures = []
File.open(archive_path, 'rb') do |file|
Archive::Tar::Minitar::Reader.open(file) do |reader|
reader.each do |entry|
extract_path = File.join(extract_to, entry.name)
if entry.directory?
FileUtils.mkdir_p(extract_path)
else
FileUtils.mkdir_p(File.dirname(extract_path))
File.open(extract_path, 'wb') { |f| f.write(entry.read) }
end
# Attempt permission restoration with error tracking
if entry.mode
begin
File.chmod(entry.mode, extract_path)
rescue Errno::EPERM, Errno::ENOTSUP => e
permission_failures << {
path: extract_path,
intended_mode: entry.mode.to_s(8),
error: e.message
}
end
end
end
end
end
unless permission_failures.empty?
warn "Permission restoration failed for #{permission_failures.length} files"
permission_failures.each do |failure|
warn " #{failure[:path]}: #{failure[:error]}"
end
end
permission_failures
end
Archive corruption from incomplete writes happens when archive creation is interrupted or when insufficient disk space prevents complete file writing. These conditions require careful error handling and validation.
# Robust archive creation with corruption prevention
def create_archive_safely(sources, output_path, temp_dir: Dir.tmpdir)
temp_path = File.join(temp_dir, "#{File.basename(output_path)}.tmp")
begin
# Create archive in temporary location first
File.open(temp_path, 'wb') do |file|
Archive::Tar::Minitar.pack(sources, file)
file.fsync # Force write to disk
end
# Verify archive integrity before moving to final location
verify_archive_integrity(temp_path)
# Atomic move to final location
FileUtils.mv(temp_path, output_path)
rescue StandardError => e
# Clean up temporary file on any error
File.unlink(temp_path) if File.exist?(temp_path)
raise "Archive creation failed: #{e.message}"
end
end
def verify_archive_integrity(archive_path)
entry_count = 0
File.open(archive_path, 'rb') do |file|
Archive::Tar::Minitar::Reader.open(file) do |reader|
reader.each do |entry|
entry_count += 1
# Attempt to read entry content to verify it's accessible
entry.read if entry.file?
end
end
end
raise "Archive appears empty or corrupted" if entry_count == 0
entry_count
end
Large file handling requires special consideration for files exceeding memory capacity or tar format limitations. The traditional tar format has size limits that affect very large files.
# Handle large files and size limitations
def handle_large_files(file_paths, output_path)
large_files = []
total_size = 0
file_paths.each do |path|
next unless File.file?(path)
size = File.size(path)
total_size += size
# Traditional tar format limit: 8GB
if size > 8 * 1024 * 1024 * 1024
large_files << { path: path, size: size }
end
end
unless large_files.empty?
warn "Warning: Files exceeding traditional tar limits:"
large_files.each do |file_info|
warn " #{file_info[:path]}: #{file_info[:size]} bytes"
end
warn "Consider using GNU tar format or splitting large files"
end
# Proceed with archive creation, streaming large files
File.open(output_path, 'wb') do |file|
Archive::Tar::Minitar::Writer.open(file) do |writer|
file_paths.each do |path|
next unless File.file?(path)
File.open(path, 'rb') do |input|
writer.add_file_simple(path, File.stat(path).mode, File.size(path)) do |entry|
while chunk = input.read(65536) # 64KB chunks
entry.write(chunk)
end
end
end
end
end
end
{ total_files: file_paths.length, total_size: total_size, large_files: large_files }
end
Reference
Core Classes and Methods
Class | Purpose | Key Methods |
---|---|---|
Archive::Tar::Minitar |
High-level archive operations | pack , unpack |
Archive::Tar::Minitar::Reader |
Streaming archive reading | open , each , rewind |
Archive::Tar::Minitar::Writer |
Streaming archive writing | open , add_file_simple , add_file |
Archive::Tar::Minitar::PosixHeader |
Entry metadata handling | name , mode , size , mtime |
High-Level Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Minitar.pack(sources, dest) |
sources (Array), dest (IO) |
nil |
Creates tar archive from file list |
Minitar.unpack(src, dest, **opts) |
src (IO), dest (String), options (Hash) |
nil |
Extracts tar archive to directory |
Streaming Reader Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Reader.open(io) |
io (IO object) |
Reader |
Opens tar stream for reading |
Reader#each |
Block | self |
Iterates over archive entries |
Reader#rewind |
None | self |
Resets stream position to beginning |
Reader#close |
None | nil |
Closes the tar stream |
Streaming Writer Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Writer.open(io) |
io (IO object) |
Writer |
Opens tar stream for writing |
Writer#add_file_simple(name, mode, size) |
name (String), mode (Integer), size (Integer) |
nil |
Adds file with basic metadata |
Writer#add_file(name, mode) |
name (String), mode (Integer) |
nil |
Adds file with full metadata |
Writer#close |
None | nil |
Finalizes and closes tar stream |
Entry Properties
Property | Type | Description |
---|---|---|
name |
String | File path within archive |
mode |
Integer | File permissions (octal) |
size |
Integer | File size in bytes |
mtime |
Time | Modification timestamp |
typeflag |
String | Entry type indicator |
linkname |
String | Link target for symbolic links |
uid |
Integer | User ID of file owner |
gid |
Integer | Group ID of file owner |
uname |
String | Username of file owner |
gname |
String | Group name of file owner |
Type Flag Constants
Flag | Value | Type |
---|---|---|
Normal File | '0' or "\0" | Regular file |
Hard Link | '1' | Hard link to another file |
Symbolic Link | '2' | Symbolic link |
Character Device | '3' | Character special device |
Block Device | '4' | Block special device |
Directory | '5' | Directory |
FIFO | '6' | Named pipe (FIFO) |
Reserved | '7' | Reserved for future use |
Common Options
Option | Default | Description |
---|---|---|
:fsync |
false |
Force filesystem sync after writes |
:data_buffer |
nil |
Custom buffer for data operations |
:verbose |
false |
Enable verbose output during operations |
Exception Hierarchy
Exception | Parent | Description |
---|---|---|
Archive::Tar::Minitar::Error |
StandardError |
Base exception class |
Archive::Tar::Minitar::InvalidTarStream |
Error |
Corrupted or invalid tar data |
Archive::Tar::Minitar::UnexpectedEOF |
Error |
Premature end of archive |
Archive::Tar::Minitar::NonSeekableStream |
Error |
Stream does not support seeking |
Compression Integration
Library | Compression | Reader Class | Writer Class |
---|---|---|---|
zlib |
Gzip | Zlib::GzipReader |
Zlib::GzipWriter |
bzip2-ffi |
Bzip2 | Bzip2::FFI::Reader |
Bzip2::FFI::Writer |
ruby-lzma |
LZMA/XZ | LZMA::Reader |
LZMA::Writer |