CrackedRuby logo

CrackedRuby

OpenURI

Overview

OpenURI extends Ruby's Kernel#open method and URI.open to handle HTTP, HTTPS, and FTP URIs transparently. The library converts these URIs into readable IO objects, making remote resource access as simple as file operations. OpenURI wraps the underlying Net::HTTP and Net::FTP libraries, providing a unified interface for different protocol types.

The library adds several key capabilities to URI handling. When opening HTTP or HTTPS URIs, OpenURI returns a StringIO object containing the response body, along with metadata accessible through additional methods. The response object includes headers, status information, and the base URI for handling redirects. For FTP URIs, OpenURI provides direct access to remote files through the same interface.

require 'open-uri'

# HTTP request returning StringIO with response body
response = URI.open('https://api.github.com/users/octocat')
puts response.read
# => {"login":"octocat","id":1,"node_id":"MDQ6VXNlcjE=", ...}

# Access response metadata
puts response.status
# => ["200", "OK"]
puts response.content_type
# => "application/json; charset=utf-8"

OpenURI handles redirects automatically, following up to 5 redirections by default. The library maintains the final URI after redirects through the base_uri method, which becomes important when processing relative links in HTML or handling API responses that redirect to different endpoints.

# Handling redirected responses
response = URI.open('https://github.com/ruby/ruby')
puts response.base_uri
# => #<URI::HTTPS https://github.com/ruby/ruby>

# Original URI vs final URI after redirect
original_uri = URI('https://git.io/ruby')
response = original_uri.open
puts "Original: #{original_uri}"
puts "Final: #{response.base_uri}"

The library integrates seamlessly with existing Ruby IO operations. Response objects respond to standard IO methods like read, readline, each_line, and rewind, making them compatible with any code expecting IO input. This design allows treating remote resources identically to local files in many contexts.

# Processing response line by line
URI.open('https://raw.githubusercontent.com/ruby/ruby/master/README.md') do |response|
  response.each_line.with_index do |line, index|
    puts "Line #{index + 1}: #{line.chomp}"
    break if index >= 5  # First 6 lines only
  end
end

Basic Usage

OpenURI supports two primary access patterns: direct URI opening and block-based resource management. The URI.open method accepts a URI string or URI object and returns an IO-like object containing the response. The library handles protocol detection automatically based on the URI scheme.

require 'open-uri'

# Direct access pattern
response = URI.open('https://httpbin.org/get')
content = response.read
headers = response.meta
response.close

# Block pattern with automatic cleanup
URI.open('https://httpbin.org/json') do |response|
  data = response.read
  puts "Content-Type: #{response.content_type}"
  puts "Status: #{response.status.join(' ')}"
  # Response automatically closed at block end
end

Request customization occurs through options passed to the open method. OpenURI supports HTTP headers, authentication, redirect limits, and timeout configurations. Headers pass as a hash, with string keys matching HTTP header names. User-Agent strings, accept headers, and custom application headers integrate through this mechanism.

# Custom headers and options
options = {
  'User-Agent' => 'Ruby OpenURI Client/1.0',
  'Accept' => 'application/json',
  'Authorization' => 'Bearer token123',
  read_timeout: 30,
  redirect: false
}

response = URI.open('https://api.example.com/data', options)

Authentication mechanisms vary by requirement. HTTP Basic Authentication encodes credentials directly in the URI or through the http_basic_authentication option. The URI-embedded approach works for simple cases, while the option approach provides better security by avoiding credential exposure in logs.

# URI-embedded authentication
response = URI.open('https://user:password@secure.example.com/api/data')

# Option-based authentication (preferred)
auth_options = {
  http_basic_authentication: ['username', 'password']
}
response = URI.open('https://secure.example.com/api/data', auth_options)

OpenURI handles SSL/TLS connections transparently for HTTPS URIs. Certificate validation occurs automatically, but applications can customize SSL behavior through additional options. Certificate verification, SSL version selection, and custom certificate stores integrate through these configuration options.

# SSL configuration options
ssl_options = {
  ssl_verify_mode: OpenSSL::SSL::VERIFY_PEER,
  ssl_ca_cert: '/path/to/ca-bundle.crt',
  ssl_cert: OpenSSL::X509::Certificate.new(cert_content),
  ssl_key: OpenSSL::PKey::RSA.new(key_content)
}

response = URI.open('https://secure-api.example.com/data', ssl_options)

Response metadata access provides detailed information about the HTTP transaction. The meta method returns a hash containing all response headers, while specific methods like content_type, content_length, and last_modified offer convenient access to common header values. Status information includes both numeric code and reason phrase.

URI.open('https://httpbin.org/response-headers?X-Custom=value') do |response|
  # All headers as hash
  headers = response.meta
  puts headers['content-type']
  
  # Convenience methods
  puts "Type: #{response.content_type}"
  puts "Length: #{response.content_length}"
  puts "Modified: #{response.last_modified}"
  puts "Status: #{response.status[0]} #{response.status[1]}"
  
  # Custom headers
  puts "Custom: #{headers['x-custom']}"
end

Error Handling & Debugging

OpenURI raises specific exception types for different failure conditions. Network timeouts, HTTP error responses, SSL certificate problems, and redirect loops each produce distinct exception classes. Understanding these exception patterns enables precise error handling and appropriate recovery strategies.

The primary exception hierarchy includes OpenURI::HTTPRedirect for redirect-related issues, OpenURI::HTTPError for HTTP status errors, and various Net:: exceptions for network-level problems. Timeout errors manifest as Net::ReadTimeout or Net::OpenTimeout, while SSL issues raise OpenSSL::SSL::SSLError subclasses.

require 'open-uri'

def fetch_with_error_handling(uri)
  URI.open(uri, read_timeout: 10)
rescue Net::ReadTimeout => e
  puts "Request timed out after 10 seconds: #{e.message}"
  nil
rescue Net::OpenTimeout => e
  puts "Connection timeout: #{e.message}"
  nil
rescue OpenURI::HTTPError => e
  puts "HTTP error #{e.io.status[0]}: #{e.io.status[1]}"
  puts "Response body: #{e.io.read}" if e.io.respond_to?(:read)
  nil
rescue OpenSSL::SSL::SSLError => e
  puts "SSL certificate error: #{e.message}"
  nil
rescue SocketError => e
  puts "Network error (DNS/connection): #{e.message}"
  nil
end

# Usage with comprehensive error handling
response = fetch_with_error_handling('https://nonexistent-domain.invalid/api')

HTTP error responses require special handling because OpenURI raises exceptions for 4xx and 5xx status codes by default. The exception object contains the response data through the io method, allowing access to error response bodies and headers. This pattern enables processing API error messages and implementing retry logic based on specific error codes.

def handle_api_errors(uri)
  URI.open(uri)
rescue OpenURI::HTTPError => e
  status_code = e.io.status[0].to_i
  error_body = e.io.read
  
  case status_code
  when 400
    puts "Bad request: #{error_body}"
    # Parse error details for debugging
    begin
      error_details = JSON.parse(error_body)
      puts "Validation errors: #{error_details['errors']}"
    rescue JSON::ParserError
      puts "Non-JSON error response"
    end
  when 401
    puts "Authentication required"
    # Trigger credential refresh
  when 403
    puts "Access denied: #{error_body}"
  when 404
    puts "Resource not found"
  when 429
    puts "Rate limited"
    # Implement backoff strategy
  when 500..599
    puts "Server error #{status_code}: #{error_body}"
    # Log for monitoring systems
  else
    puts "Unexpected HTTP error #{status_code}"
  end
end

Redirect handling becomes complex when dealing with infinite redirect loops or redirect limits. OpenURI follows redirects automatically but raises OpenURI::HTTPRedirect when limits are exceeded. Custom redirect handling requires disabling automatic redirects and implementing manual redirect logic.

def handle_redirects_manually(uri, max_redirects = 5)
  current_uri = uri
  redirect_count = 0
  
  loop do
    begin
      response = URI.open(current_uri, redirect: false)
      return response  # Success, no redirect
    rescue OpenURI::HTTPRedirect => e
      redirect_count += 1
      
      if redirect_count > max_redirects
        raise "Too many redirects (#{redirect_count}): #{current_uri}"
      end
      
      # Extract redirect location
      location = e.io.meta['location']
      if location.nil?
        raise "Redirect without Location header"
      end
      
      # Resolve relative redirects
      current_uri = URI.join(current_uri.to_s, location).to_s
      puts "Redirect #{redirect_count}: #{current_uri}"
    end
  end
end

Debugging network issues requires examining request and response details. OpenURI doesn't provide built-in logging, but you can implement request tracing by wrapping URI.open calls with debugging output. This approach helps identify network problems, header issues, and response processing errors.

def debug_request(uri, options = {})
  puts "=== REQUEST DEBUG ==="
  puts "URI: #{uri}"
  puts "Options: #{options.inspect}"
  
  start_time = Time.now
  
  begin
    response = URI.open(uri, options)
    end_time = Time.now
    
    puts "=== RESPONSE DEBUG ==="
    puts "Status: #{response.status.join(' ')}"
    puts "Headers: #{response.meta.to_h}"
    puts "Content-Type: #{response.content_type}"
    puts "Content-Length: #{response.content_length}"
    puts "Base URI: #{response.base_uri}"
    puts "Request duration: #{((end_time - start_time) * 1000).round(2)}ms"
    
    response
  rescue => e
    end_time = Time.now
    puts "=== ERROR DEBUG ==="
    puts "Exception: #{e.class}"
    puts "Message: #{e.message}"
    puts "Request duration: #{((end_time - start_time) * 1000).round(2)}ms"
    raise
  end
end

# Usage for debugging problematic requests
response = debug_request('https://httpbin.org/delay/2', read_timeout: 5)

Production Patterns

Production OpenURI usage requires robust error handling, connection management, and monitoring integration. Applications should implement retry logic with exponential backoff for transient network failures, maintain connection pools where possible, and provide comprehensive logging for debugging production issues.

Connection reuse becomes important when making multiple requests to the same host. While OpenURI doesn't expose connection pooling directly, you can implement session management using Net::HTTP directly for high-volume scenarios, falling back to OpenURI for simple cases.

require 'open-uri'
require 'net/http'
require 'logger'

class ProductionHttpClient
  def initialize(logger: Logger.new($stdout))
    @logger = logger
    @retry_attempts = 3
    @base_timeout = 30
  end
  
  def fetch_with_retries(uri, options = {})
    attempts = 0
    
    begin
      attempts += 1
      @logger.info("Fetching #{uri} (attempt #{attempts})")
      
      start_time = Time.now
      response = URI.open(uri, default_options.merge(options))
      duration = Time.now - start_time
      
      @logger.info("Successfully fetched #{uri} in #{duration.round(3)}s")
      response
      
    rescue Net::ReadTimeout, Net::OpenTimeout => e
      if attempts < @retry_attempts
        backoff_time = @base_timeout * (2 ** (attempts - 1))
        @logger.warn("Timeout on attempt #{attempts}, retrying in #{backoff_time}s: #{e.message}")
        sleep(backoff_time)
        retry
      else
        @logger.error("Failed to fetch #{uri} after #{attempts} attempts: #{e.message}")
        raise
      end
      
    rescue OpenURI::HTTPError => e
      status_code = e.io.status[0].to_i
      
      # Retry on 5xx errors, but not 4xx
      if (500..599).include?(status_code) && attempts < @retry_attempts
        backoff_time = @base_timeout * (2 ** (attempts - 1))
        @logger.warn("HTTP #{status_code} on attempt #{attempts}, retrying in #{backoff_time}s")
        sleep(backoff_time)
        retry
      else
        @logger.error("HTTP error #{status_code} for #{uri}: #{e.io.read}")
        raise
      end
    end
  end
  
  private
  
  def default_options
    {
      'User-Agent' => 'MyApp/1.0 (Production)',
      read_timeout: @base_timeout,
      open_timeout: 15,
      ssl_verify_mode: OpenSSL::SSL::VERIFY_PEER
    }
  end
end

# Usage in production code
client = ProductionHttpClient.new
response = client.fetch_with_retries('https://api.external-service.com/data')

Monitoring integration requires tracking request metrics, error rates, and response times. Production applications should emit metrics to monitoring systems and implement health checks that verify external service availability. Circuit breaker patterns help prevent cascade failures when external services become unavailable.

class MonitoredHttpClient
  def initialize(metrics_client, circuit_breaker)
    @metrics = metrics_client
    @circuit_breaker = circuit_breaker
  end
  
  def fetch_with_monitoring(uri, options = {})
    return nil if @circuit_breaker.open?
    
    start_time = Time.now
    
    begin
      response = URI.open(uri, options)
      duration = Time.now - start_time
      
      # Record success metrics
      @metrics.increment('http_requests_total', tags: ['status:success'])
      @metrics.histogram('http_request_duration', duration * 1000)
      @circuit_breaker.record_success
      
      response
      
    rescue => e
      duration = Time.now - start_time
      
      # Record error metrics
      error_type = e.class.name.downcase
      @metrics.increment('http_requests_total', tags: ["status:error", "error:#{error_type}"])
      @metrics.histogram('http_request_duration', duration * 1000)
      @circuit_breaker.record_failure
      
      raise
    end
  end
  
  def health_check(endpoints)
    results = {}
    
    endpoints.each do |name, uri|
      begin
        response = URI.open(uri, read_timeout: 5, open_timeout: 5)
        results[name] = {
          status: 'healthy',
          response_code: response.status[0],
          response_time: Time.now - start_time
        }
      rescue => e
        results[name] = {
          status: 'unhealthy',
          error: e.message
        }
      end
    end
    
    results
  end
end

Caching strategies reduce external API calls and improve application performance. Implement HTTP caching by respecting cache-control headers and ETags from responses. For applications with high request volumes, consider implementing response caching with appropriate invalidation strategies.

require 'digest'

class CachingHttpClient
  def initialize(cache_store, default_ttl: 300)
    @cache = cache_store
    @default_ttl = default_ttl
  end
  
  def fetch_with_cache(uri, options = {})
    cache_key = generate_cache_key(uri, options)
    
    # Try to get from cache first
    cached_response = @cache.read(cache_key)
    if cached_response && !cached_response[:expired]
      # Use conditional request if we have ETag
      if cached_response[:etag]
        options['If-None-Match'] = cached_response[:etag]
      end
    end
    
    begin
      response = URI.open(uri, options)
      
      # Handle 304 Not Modified
      if response.status[0] == '304'
        return cached_response[:body]
      end
      
      # Cache new response
      cache_data = {
        body: response.read,
        etag: response.meta['etag'],
        expires_at: Time.now + determine_ttl(response),
        expired: false
      }
      
      @cache.write(cache_key, cache_data)
      cache_data[:body]
      
    rescue OpenURI::HTTPError => e
      # Return cached version on server errors if available
      if cached_response && (500..599).include?(e.io.status[0].to_i)
        cached_response[:body]
      else
        raise
      end
    end
  end
  
  private
  
  def generate_cache_key(uri, options)
    content = "#{uri}#{options.to_s}"
    Digest::SHA256.hexdigest(content)
  end
  
  def determine_ttl(response)
    cache_control = response.meta['cache-control']
    if cache_control && match = cache_control.match(/max-age=(\d+)/)
      match[1].to_i
    else
      @default_ttl
    end
  end
end

Reference

Core Methods

Method Parameters Returns Description
URI.open(uri, **options, &block) uri (String/URI), options (Hash) StringIO or File Opens URI and returns IO object with response
Kernel#open(uri, **options, &block) uri (String), options (Hash) StringIO or File Alias for URI.open when uri starts with protocol

Response Object Methods

Method Parameters Returns Description
#read(length=nil) length (Integer, optional) String Reads response body content
#readline None String Reads single line from response
#each_line(&block) Block Enumerator Iterates over response lines
#rewind None Integer Resets read position to beginning
#close None nil Closes the response object

Response Metadata Methods

Method Parameters Returns Description
#meta None Hash All response headers as hash
#content_type None String Content-Type header value
#content_length None Integer Content-Length header as integer
#last_modified None Time Last-Modified header as Time object
#status None Array HTTP status code and reason phrase
#base_uri None URI Final URI after redirects

Request Options

Option Type Default Description
'User-Agent' String Ruby/VERSION User-Agent header string
'Accept' String */* Accept header for content types
'Referer' String None Referer header value
read_timeout Integer 60 Read timeout in seconds
open_timeout Integer 60 Connection timeout in seconds
redirect Boolean true Follow redirects automatically
ssl_verify_mode Integer VERIFY_PEER SSL certificate verification mode

Authentication Options

Option Type Description
http_basic_authentication Array [username, password] for HTTP Basic Auth
ssl_cert OpenSSL::X509::Certificate Client certificate for SSL
ssl_key OpenSSL::PKey Private key for client certificate
ssl_ca_cert String Path to CA certificate file

Exception Classes

Exception Parent Class Description
OpenURI::HTTPError StandardError HTTP 4xx/5xx response codes
OpenURI::HTTPRedirect HTTPError Redirect limit exceeded
Net::ReadTimeout Net::TimeoutError Read operation timeout
Net::OpenTimeout Net::TimeoutError Connection establishment timeout
OpenSSL::SSL::SSLError OpenSSL::OpenSSLError SSL/TLS connection errors
SocketError StandardError Network connection errors

SSL Options

Option Type Default Description
ssl_verify_mode Integer VERIFY_PEER Certificate verification level
ssl_verify_depth Integer 5 Certificate chain depth limit
ssl_version Symbol Auto Specific SSL/TLS version
ssl_ciphers String System default Allowed cipher suites
ssl_ca_file String System default CA certificate bundle file

HTTP Status Code Patterns

Status Range OpenURI Behavior Exception Class
1xx (Informational) Transparent handling None
2xx (Success) Normal response None
3xx (Redirect) Follow automatically HTTPRedirect if limit exceeded
4xx (Client Error) Raise exception HTTPError
5xx (Server Error) Raise exception HTTPError

Protocol Support

Protocol URI Scheme Implementation Features
HTTP http:// Net::HTTP Full HTTP/1.1 support
HTTPS https:// Net::HTTP + OpenSSL SSL/TLS encryption
FTP ftp:// Net::FTP File transfer protocol