Challenge

Problem

Marketing teams extract contact information from customer inquiries, privacy compliance requires finding personal data before sharing documents, and sales teams build contact lists from exported records. This drill teaches you to use regular expressions to extract email addresses, URLs, and phone numbers from unstructured text files. You'll learn practical regex patterns for real-world data extraction tasks.

Difficulty: Beginner

Instructions

  1. Read the provided text file
  2. Extract all email addresses using regex pattern
  3. Extract all URLs (http:// and https://)
  4. Extract all US phone numbers (various formats: (555) 123-4567, 555-123-4567, 555.123.4567)
  5. Output results grouped by type:
    'Emails found: 3'
    ' user@example.com'
    ' contact@company.org'
    'URLs found: 2'
    ' https://example.com'
    'Phone numbers found: 2'
    ' (555) 123-4567'
  6. Remove duplicates and sort alphabetically within each group

Files

Editable
Read-only

Hints

Hint 1

String#scan with a regex returns all matches as an array

Hint 2

Email regex: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i (case insensitive)

Hint 3

URL regex: /https?:\/\/[^\s]+/ matches http:// or https://

Hint 4

Phone regex: /\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}/ handles multiple formats

Hint 5

Use .uniq to remove duplicates and .sort to alphabetize

Hint 6

\b in regex means word boundary

Provided Files (Read-only)

1. Basic extraction - mixed content

Input:
extract_contact_info('contacts.txt')
Expected Output:
Emails found: 1
  john@example.com

URLs found: 1
  https://example.com

Phone numbers found: 1
  555-123-4567

2. Multiple emails and URLs

Input:
extract_contact_info('contacts.txt')
Expected Output:
Emails found: 2
  sales@example.com
  support@example.com

URLs found: 2
  https://company.org
  https://example.com

Phone numbers found: 0

3. Various phone formats

Input:
extract_contact_info('contacts.txt')
Expected Output:
Emails found: 0

URLs found: 0

Phone numbers found: 3
  (555) 123-4567
  555-987-6543
  555.111.2222
+ 2 hidden test cases