Overview
Ruby provides built-in support for RSS and Atom feeds through the rss
library, which handles parsing, generation, and manipulation of syndication feeds. The library supports RSS 0.9, 1.0, 2.0, and Atom 1.0 formats with automatic format detection and conversion capabilities.
The RSS
module serves as the primary interface, containing parser classes for different feed formats and maker classes for feed generation. Ruby automatically detects feed format during parsing and provides unified access patterns regardless of the underlying format.
require 'rss'
require 'open-uri'
# Parse a feed from URL
feed = RSS::Parser.parse(URI.open('https://example.com/feed.xml'))
puts feed.channel.title
puts feed.items.first.title
The library handles XML namespace resolution, character encoding detection, and provides accessor methods that abstract format differences between RSS and Atom. Feed objects maintain the original structure while offering normalized access patterns.
# Parse from string content
xml_content = File.read('feed.xml')
feed = RSS::Parser.parse(xml_content)
# Access works consistently across formats
feed.items.each do |item|
puts item.title
puts item.link
puts item.description
end
Ruby's RSS implementation includes validation capabilities, format conversion, and extension support for common RSS modules like Dublin Core and Content. The parser handles malformed feeds gracefully while providing access to validation errors.
# Parse with validation
begin
feed = RSS::Parser.parse(xml_content, validate: true)
rescue RSS::InvalidRSSError => e
puts "Feed validation failed: #{e.message}"
# Parse without validation for error recovery
feed = RSS::Parser.parse(xml_content, validate: false)
end
Basic Usage
Feed parsing begins with the RSS::Parser.parse
method, which accepts URLs, file paths, or string content. The parser automatically detects RSS and Atom formats and returns appropriate feed objects with unified interfaces.
require 'rss'
require 'open-uri'
# Parse from URL
feed = RSS::Parser.parse(URI.open('https://feeds.example.com/news.xml'))
# Parse from file
feed = RSS::Parser.parse(File.read('local_feed.xml'))
# Parse from string
xml_string = '<rss version="2.0">...</rss>'
feed = RSS::Parser.parse(xml_string)
Feed objects provide structured access to metadata and items. RSS feeds expose channel information through the channel
property, while Atom feeds provide direct access to feed-level properties. Item iteration works consistently across formats.
# Access feed metadata
puts feed.channel.title # RSS format
puts feed.channel.description
puts feed.channel.link
puts feed.channel.language
# Atom feeds access metadata directly
puts feed.title # Atom format
puts feed.subtitle
puts feed.link.href
Item processing involves iterating through the items
collection, which contains entry objects with normalized property access. Each item provides title, content, links, and metadata regardless of the underlying feed format.
feed.items.each do |item|
puts "Title: #{item.title}"
puts "Link: #{item.link}"
puts "Date: #{item.pubDate}" # RSS
puts "Date: #{item.updated}" # Atom
puts "Summary: #{item.description}"
puts "---"
end
Feed generation uses maker classes that provide programmatic feed construction. The RSS::Maker
module contains format-specific builders with chainable methods for setting feed properties and adding items.
require 'rss'
# Create RSS 2.0 feed
feed = RSS::Maker.make("2.0") do |maker|
maker.channel.title = "My Blog"
maker.channel.description = "Latest blog posts"
maker.channel.link = "https://myblog.example.com"
maker.channel.language = "en"
# Add items
maker.items.new_item do |item|
item.title = "First Post"
item.link = "https://myblog.example.com/post/1"
item.description = "This is my first blog post"
item.pubDate = Time.now
end
end
puts feed.to_s
Format conversion occurs automatically through the maker interface. Parse an existing feed and regenerate it in a different format by specifying the target format version.
# Convert RSS to Atom
rss_feed = RSS::Parser.parse(rss_content)
atom_feed = RSS::Maker.make("atom") do |maker|
maker.channel.title = rss_feed.channel.title
maker.channel.description = rss_feed.channel.description
maker.channel.link = rss_feed.channel.link
rss_feed.items.each do |rss_item|
maker.items.new_item do |item|
item.title = rss_item.title
item.link = rss_item.link
item.description = rss_item.description
item.updated = rss_item.pubDate
end
end
end
Error Handling & Debugging
RSS parsing encounters various error conditions including malformed XML, invalid feed structures, encoding issues, and network failures. Ruby's RSS library provides specific exception classes for different error types and validation levels.
require 'rss'
require 'open-uri'
def parse_feed_safely(source)
begin
feed = RSS::Parser.parse(source, validate: true)
return feed
rescue RSS::InvalidRSSError => e
puts "Invalid RSS structure: #{e.message}"
# Attempt parsing without validation
begin
feed = RSS::Parser.parse(source, validate: false)
puts "Parsed with validation disabled"
return feed
rescue RSS::Error => e
puts "RSS parsing failed completely: #{e.message}"
return nil
end
rescue OpenURI::HTTPError => e
puts "HTTP error fetching feed: #{e.message}"
return nil
rescue SocketError => e
puts "Network error: #{e.message}"
return nil
rescue StandardError => e
puts "Unexpected error: #{e.message}"
return nil
end
end
Encoding problems occur frequently with international feeds. Ruby handles encoding detection automatically, but manual encoding specification helps with problematic feeds that declare incorrect encodings or contain mixed encodings.
def handle_encoding_issues(xml_content)
# Try parsing with detected encoding
begin
return RSS::Parser.parse(xml_content)
rescue RSS::NotWellFormedError => e
puts "Encoding issue detected: #{e.message}"
end
# Force UTF-8 encoding
begin
utf8_content = xml_content.force_encoding('UTF-8')
return RSS::Parser.parse(utf8_content)
rescue RSS::Error
# Try common problematic encodings
['ISO-8859-1', 'Windows-1252'].each do |encoding|
begin
converted = xml_content.encode('UTF-8', encoding, invalid: :replace, undef: :replace)
return RSS::Parser.parse(converted)
rescue RSS::Error
next
end
end
end
raise "Unable to parse feed with any encoding"
end
XML parsing errors result from malformed markup, unclosed tags, or invalid characters. The RSS library provides detailed error messages that identify problematic sections, enabling targeted fixing or content sanitization.
def debug_xml_structure(xml_content)
begin
RSS::Parser.parse(xml_content, validate: true)
rescue RSS::NotWellFormedError => e
# Extract line and column information
if e.message =~ /line (\d+), column (\d+)/
line_num = $1.to_i
column_num = $2.to_i
lines = xml_content.split("\n")
problematic_line = lines[line_num - 1]
puts "XML Error at line #{line_num}, column #{column_num}:"
puts problematic_line
puts " " * (column_num - 1) + "^"
puts "Context:"
# Show surrounding lines
start_line = [0, line_num - 3].max
end_line = [lines.length - 1, line_num + 2].min
(start_line..end_line).each do |i|
marker = i == line_num - 1 ? ">>> " : " "
puts "#{marker}#{i + 1}: #{lines[i]}"
end
end
raise e
end
end
Validation debugging involves examining feed structure compliance with RSS and Atom specifications. Ruby provides detailed validation feedback when strict parsing fails, identifying missing required elements or invalid content structures.
def validate_feed_structure(xml_content)
errors = []
begin
feed = RSS::Parser.parse(xml_content, validate: true)
puts "Feed validation successful"
return true
rescue RSS::MissingTagError => e
errors << "Missing required tag: #{e.tag}"
rescue RSS::TooMuchTagError => e
errors << "Too many instances of tag: #{e.tag}"
rescue RSS::MissingAttributeError => e
errors << "Missing required attribute: #{e.attribute} in tag #{e.tag}"
rescue RSS::UnknownTagError => e
errors << "Unknown tag: #{e.tag}"
rescue RSS::InvalidRSSError => e
errors << "Invalid RSS structure: #{e.message}"
end
puts "Validation errors found:"
errors.each { |error| puts " - #{error}" }
# Check if parseable without validation
begin
RSS::Parser.parse(xml_content, validate: false)
puts "Feed is parseable but not strictly valid"
rescue RSS::Error => e
puts "Feed is completely unparseable: #{e.message}"
end
false
end
Production Patterns
RSS feed processing in production environments requires robust error handling, caching strategies, performance optimization, and monitoring. Production systems handle feed updates, content extraction, and integration with web applications and background job systems.
class FeedProcessor
attr_reader :url, :last_updated, :etag, :last_modified
def initialize(url)
@url = url
@last_updated = nil
@etag = nil
@last_modified = nil
end
def fetch_updates
headers = {}
headers['If-None-Match'] = @etag if @etag
headers['If-Modified-Since'] = @last_modified if @last_modified
begin
response = URI.open(@url, headers)
# Update cache headers
@etag = response.meta['etag']
@last_modified = response.meta['last-modified']
@last_updated = Time.now
parse_and_process(response.read)
rescue OpenURI::HTTPError => e
case e.message
when /304/
puts "Feed not modified since last fetch"
return :not_modified
when /404/
puts "Feed not found: #{@url}"
return :not_found
else
puts "HTTP error: #{e.message}"
return :error
end
end
end
private
def parse_and_process(content)
feed = RSS::Parser.parse(content, validate: false)
process_items(feed.items)
rescue RSS::Error => e
puts "Feed parsing error: #{e.message}"
return :parse_error
end
def process_items(items)
items.each do |item|
# Extract and store item data
item_data = {
title: item.title,
link: item.link,
content: extract_content(item),
published_at: extract_date(item),
guid: extract_guid(item)
}
store_item(item_data)
end
end
end
Background job integration handles feed processing asynchronously to avoid blocking web requests. Jobs manage feed fetching, parsing, content extraction, and database updates with proper error handling and retry logic.
class FeedUpdateJob
include Sidekiq::Worker
sidekiq_options retry: 3, dead: false
def perform(feed_id)
feed = Feed.find(feed_id)
processor = FeedProcessor.new(feed.url)
result = processor.fetch_updates
case result
when :not_modified
feed.touch(:last_checked_at)
when :not_found
feed.increment!(:not_found_count)
disable_feed_if_needed(feed)
when :error, :parse_error
feed.increment!(:error_count)
schedule_retry_if_needed(feed)
else
feed.update!(
last_successful_fetch_at: Time.current,
error_count: 0,
not_found_count: 0
)
end
rescue StandardError => e
Rails.logger.error "Feed update failed for feed #{feed_id}: #{e.message}"
raise e
end
private
def disable_feed_if_needed(feed)
if feed.not_found_count >= 5
feed.update!(active: false)
NotificationMailer.feed_disabled(feed).deliver_now
end
end
def schedule_retry_if_needed(feed)
if feed.error_count < 10
delay = [feed.error_count * 30, 3600].min
FeedUpdateJob.perform_in(delay.seconds, feed.id)
end
end
end
Rails integration involves creating models for feeds and items with proper associations, validations, and callback handling. ActiveRecord provides persistence while background jobs handle the actual feed processing.
class Feed < ApplicationRecord
has_many :items, dependent: :destroy
validates :url, presence: true, uniqueness: true
validates :title, presence: true
scope :active, -> { where(active: true) }
scope :due_for_update, -> { where('last_checked_at < ?', 1.hour.ago) }
after_create :schedule_initial_fetch
def fetch_updates!
FeedUpdateJob.perform_async(id)
end
def self.schedule_updates
active.due_for_update.find_each(&:fetch_updates!)
end
private
def schedule_initial_fetch
FeedUpdateJob.perform_async(id)
end
end
class Item < ApplicationRecord
belongs_to :feed
validates :title, presence: true
validates :guid, uniqueness: { scope: :feed_id }
scope :recent, -> { order(published_at: :desc) }
scope :published_since, ->(date) { where('published_at > ?', date) }
before_create :extract_content_preview
private
def extract_content_preview
if content.present?
self.preview = ActionController::Base.helpers.strip_tags(content).truncate(200)
end
end
end
Monitoring and alerting track feed health, processing performance, and error rates. Production systems need visibility into feed update frequency, parsing success rates, and content quality metrics.
class FeedMonitor
def self.health_check
stats = {
total_feeds: Feed.count,
active_feeds: Feed.active.count,
feeds_due_for_update: Feed.due_for_update.count,
feeds_with_recent_errors: Feed.where('error_count > 0').count,
average_items_per_feed: Item.joins(:feed).where(feeds: { active: true }).count.to_f / Feed.active.count,
last_successful_update: Feed.maximum(:last_successful_fetch_at)
}
# Check for concerning metrics
alerts = []
alerts << "Many feeds due for update" if stats[:feeds_due_for_update] > stats[:active_feeds] * 0.5
alerts << "High error rate" if stats[:feeds_with_recent_errors] > stats[:active_feeds] * 0.1
alerts << "No recent updates" if stats[:last_successful_update] < 2.hours.ago
{ stats: stats, alerts: alerts }
end
def self.performance_metrics
{
avg_processing_time: FeedUpdateJob.average_processing_time,
job_queue_size: Sidekiq::Queue.new('default').size,
failed_jobs_count: Sidekiq::RetrySet.new.size,
items_processed_today: Item.where('created_at > ?', 1.day.ago).count
}
end
end
Reference
Core Classes and Modules
Class/Module | Purpose | Key Methods |
---|---|---|
RSS |
Main module containing all RSS functionality | ::Parser , ::Maker |
RSS::Parser |
Feed parsing functionality | ::parse(source, validate: true) |
RSS::Maker |
Feed generation functionality | ::make(version, &block) |
RSS::Rss |
RSS format feed objects | #channel , #items , #version |
RSS::Atom::Feed |
Atom format feed objects | #title , #entries , #updated |
Parser Methods
Method | Parameters | Returns | Description |
---|---|---|---|
RSS::Parser.parse(source, validate: true) |
source (String/URI), validate (Boolean) |
Feed object | Parse RSS/Atom feed from source |
RSS::Parser.parse(source, do_validate: false) |
source (String/URI), do_validate (Boolean) |
Feed object | Legacy validation parameter name |
Feed Object Properties
Property | RSS Access | Atom Access | Returns | Description |
---|---|---|---|---|
Title | feed.channel.title |
feed.title.content |
String | Feed title |
Description | feed.channel.description |
feed.subtitle.content |
String | Feed description |
Link | feed.channel.link |
feed.link.href |
String | Feed homepage URL |
Language | feed.channel.language |
feed.lang |
String | Feed language code |
Copyright | feed.channel.copyright |
feed.rights.content |
String | Copyright information |
Items | feed.items |
feed.entries |
Array | Collection of feed items |
Item Object Properties
Property | RSS Access | Atom Access | Returns | Description |
---|---|---|---|---|
title |
item.title |
entry.title.content |
String | Item title |
link |
item.link |
entry.link.href |
String | Item URL |
description |
item.description |
entry.summary.content |
String | Item summary/description |
content |
item.content_encoded |
entry.content.content |
String | Full item content |
pubDate |
item.pubDate |
entry.published.content |
Time | Publication date |
guid |
item.guid.content |
entry.id.content |
String | Unique identifier |
author |
item.author |
entry.author.name.content |
String | Item author |
category |
item.category |
entry.category.term |
String | Item category/tag |
Maker Interface
Method | Parameters | Returns | Description |
---|---|---|---|
RSS::Maker.make(version, &block) |
version (String), block |
Feed object | Create new feed of specified version |
maker.channel.title = value |
value (String) |
String | Set feed title |
maker.channel.description = value |
value (String) |
String | Set feed description |
maker.channel.link = value |
value (String) |
String | Set feed link |
maker.items.new_item(&block) |
block | Item object | Add new item to feed |
Supported Feed Versions
Version String | Feed Format | Description |
---|---|---|
"0.91" |
RSS 0.91 | Early RSS format |
"0.92" |
RSS 0.92 | Enhanced RSS 0.91 |
"1.0" |
RSS 1.0 | RDF-based RSS |
"2.0" |
RSS 2.0 | Most common RSS format |
"atom" |
Atom 1.0 | IETF Atom Syndication Format |
Exception Hierarchy
Exception | Parent | Description |
---|---|---|
RSS::Error |
StandardError |
Base RSS exception |
RSS::InvalidRSSError |
RSS::Error |
Invalid feed structure |
RSS::NotWellFormedError |
RSS::InvalidRSSError |
Malformed XML |
RSS::MissingTagError |
RSS::InvalidRSSError |
Required tag missing |
RSS::TooMuchTagError |
RSS::InvalidRSSError |
Too many tag instances |
RSS::MissingAttributeError |
RSS::InvalidRSSError |
Required attribute missing |
RSS::UnknownTagError |
RSS::InvalidRSSError |
Unrecognized tag found |
Common Validation Options
Option | Type | Default | Description |
---|---|---|---|
validate |
Boolean | true |
Enable strict RSS/Atom validation |
do_validate |
Boolean | true |
Legacy parameter name for validation |
ignore_unknown_element |
Boolean | false |
Skip unknown XML elements |
compatible |
Boolean | false |
Enable compatibility mode for malformed feeds |
Content Extraction Patterns
Pattern | Usage | Example |
---|---|---|
Plain text extraction | item.description |
Standard RSS description field |
HTML content | item.content_encoded |
Full HTML content from RSS |
Atom content | entry.content.content |
Atom entry content |
Summary text | entry.summary.content |
Atom entry summary |
CDATA handling | Automatic | XML CDATA sections parsed automatically |
Date Handling
Format | RSS Field | Atom Field | Ruby Conversion |
---|---|---|---|
RFC 822 | pubDate |
N/A | Time.parse(date_string) |
ISO 8601 | N/A | published , updated |
Time.iso8601(date_string) |
Custom parsing | Various | Various | DateTime.strptime(date, format) |
Feed Detection Patterns
# Detect feed format from content
def detect_feed_format(content)
case content
when /<rss/i
'RSS'
when /<feed.*xmlns.*atom/i
'Atom'
when /<rdf:RDF/i
'RSS 1.0'
else
'Unknown'
end
end
Performance Considerations
Aspect | Recommendation | Impact |
---|---|---|
Validation | Disable for production parsing | 2-3x faster parsing |
Encoding | Specify encoding when known | Reduces encoding detection overhead |
Memory usage | Process items iteratively | Reduces memory footprint for large feeds |
Network timeouts | Set reasonable timeouts | Prevents hanging requests |
Caching | Cache parsed feed objects | Reduces parsing overhead |