Web forms produce inconsistent data, and importing customer records from external systems requires normalization. This drill teaches you to clean messy CSV data by stripping whitespace, normalizing phone formats, standardizing emails, validating required fields, removing duplicates, and handling malformed rows. You'll learn data validation patterns essential for preventing SQL errors and data corruption before expensive processing begins.
String#strip removes leading/trailing whitespace
String#downcase converts to lowercase
String#gsub(/\D/, '') removes all non-digit characters
Use hash to track seen emails: seen_emails[email] = true
Validate email with email.include?('@') && email.include?('.')
Split company name, capitalize each word, rejoin: .split.map(&:capitalize).join(' ')
Track errors with array, write to file at end
clean_csv('dirty_data.csv')
Processed 2 rows: 2 cleaned, 0 errors
clean_csv('dirty_data.csv')
Processed 3 rows: 1 cleaned, 2 errors Errors found - see error_report.txt
clean_csv('dirty_data.csv')
Processed 3 rows: 2 cleaned, 0 errors 1 duplicate removed
Console output will appear here...
Are you sure?
You're making great progress