Challenge

Problem

Web forms produce inconsistent data, and importing customer records from external systems requires normalization. This drill teaches you to clean messy CSV data by stripping whitespace, normalizing phone formats, standardizing emails, validating required fields, removing duplicates, and handling malformed rows. You'll learn data validation patterns essential for preventing SQL errors and data corruption. Note: This drill parses CSV manually using string methods rather than Ruby's CSV library.

Difficulty: Intermediate

Instructions

  1. Read user-submitted CSV file with headers (Name, Email, Phone, Company)
  2. Clean each field:
    • Strip leading/trailing whitespace
    • Normalize emails: lowercase and trim
    • Normalize phone: remove formatting, keep digits only, format as XXX-XXX-XXXX
    • Capitalize company names properly
  3. Validate required fields (Name and Email cannot be empty)
  4. Validate email format (contains @ and .)
  5. Remove duplicate rows (by email)
  6. Print summary: X rows processed, Y cleaned, Z errors
  7. Report duplicates removed and errors found

Files

Editable
Read-only

Hints

Hint 1

String#strip removes leading/trailing whitespace

Hint 2

String#downcase converts to lowercase

Hint 3

String#gsub(/\D/, '') removes all non-digit characters

Hint 4

Use hash to track seen emails: seen_emails[email] = true

Hint 5

Validate email with email.include?('@') && email.include?('.')

Hint 6

Split company name, capitalize each word: .split.map(&:capitalize).join(' ')

Ruby 3.4

Provided Files (Read-only)

1. Correct row counts

Input:
clean_csv('dirty_data.csv')
Expected Output:
Processed 5 rows: 2 cleaned, 2 errors
1 duplicate removed
Errors found

2. Detects missing name error

Input:
clean_csv('dirty_data.csv')
puts 'Validation OK'
Expected Output:
Processed 5 rows: 2 cleaned, 2 errors
1 duplicate removed
Errors found
Validation OK

3. Detects invalid email format

Input:
clean_csv('dirty_data.csv')
puts 'Email check OK'
Expected Output:
Processed 5 rows: 2 cleaned, 2 errors
1 duplicate removed
Errors found
Email check OK
+ 2 hidden test cases