Challenge

Problem

Marketing teams extract contact information from customer inquiries, privacy compliance requires finding personal data before sharing documents, and sales teams build contact lists from exported records. This drill teaches you to use regular expressions to extract email addresses, URLs, and phone numbers from unstructured text files. You'll learn practical regex patterns for real-world data extraction tasks.

Difficulty: Beginner

Instructions

  1. Read the provided text file
  2. Extract all email addresses using regex pattern
  3. Extract all URLs (http and https protocols)
  4. Extract all US phone numbers (various formats: (555) 123-4567, 555-123-4567, 555.123.4567)
  5. Output results grouped by type:
    'Emails found: X'
    ' user@example.com'
    'URLs found: X'
    ' https://example.com'
    'Phone numbers found: X'
    ' (555) 123-4567'
  6. Remove duplicates and sort alphabetically within each group

Files

Editable
Read-only

Hints

Hint 1

String#scan with a regex returns all matches as an array

Hint 2

Email regex: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i (case insensitive)

Hint 3

URL regex: /https?:\/\/[^\s]+/ matches http:// or https://

Hint 4

Phone regex: /\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}/ handles multiple formats

Hint 5

Use .uniq to remove duplicates and .sort to alphabetize

Hint 6

\b in regex means word boundary

Ruby 3.4

Provided Files (Read-only)

1. Correct email count

Input:
extract_contact_info('contacts.txt')
Expected Output:
Emails found: 3
  john.doe@company.org
  sales@example.com
  support@example.com

URLs found: 2
  https://company.org
  https://example.com

Phone numbers found: 3
  (555) 123-4567
  555-987-6543
  555.111.2222

2. Emails sorted alphabetically

Input:
extract_contact_info('contacts.txt')
puts '---'
puts 'verified'
Expected Output:
Emails found: 3
  john.doe@company.org
  sales@example.com
  support@example.com

URLs found: 2
  https://company.org
  https://example.com

Phone numbers found: 3
  (555) 123-4567
  555-987-6543
  555.111.2222
---
verified

3. Duplicates removed

Input:
extract_contact_info('contacts.txt')
puts 'Duplicates handled correctly'
Expected Output:
Emails found: 3
  john.doe@company.org
  sales@example.com
  support@example.com

URLs found: 2
  https://company.org
  https://example.com

Phone numbers found: 3
  (555) 123-4567
  555-987-6543
  555.111.2222
Duplicates handled correctly
+ 2 hidden test cases