Challenge

Problem

DevOps teams analyze application performance by parsing web server logs to identify slow endpoints. This drill teaches you to extract structured data from semi-structured log files using regular expressions with named groups. You'll parse Heroku router logs containing timestamps, HTTP methods, paths, service times, and status codes, then output the data to CSV for analysis in Excel or other tools.

Difficulty: Intermediate

Instructions

  1. Read the provided server log file line by line
  2. Parse each log line using regex to extract:
  • Timestamp (e.g., '2012-05-23T20:52:11+00:00')
  • HTTP method (GET, POST, etc.)
  • Request path (e.g., '/users')
  • Service time in ms (e.g., '20ms' -> 20)
  • Status code (e.g., 200, 404, 500)
  1. Skip lines that don't match the expected format
  2. Output to CSV with headers: Timestamp,Method,Path,Service_Time_ms,Status
  3. Sort by service time (slowest first)
  4. Print summary: 'Parsed X requests, average service time: Y ms'

Files

Editable
Read-only

Hints

Hint 1

Use regex with named groups: /(?<name>pattern)/ for cleaner extraction

Hint 2

Match the timestamp at line start: /^(?<timestamp>\S+)/

Hint 3

Extract method with word characters: /(?<method>\w+)/

Hint 4

Parse service time: /service=(?<service>\d+)ms/ then convert to integer

Hint 5

File.readlines reads all lines into an array

Hint 6

Use match[:group_name] to access named capture groups

Hint 7

Skip non-matching lines with 'next unless match'

Provided Files (Read-only)

1. Basic log parsing - 3 requests

Input:
parse_logs('server.log')
Expected Output:
Parsed 3 requests, average service time: 72 ms

2. Single request

Input:
parse_logs('server.log')
Expected Output:
Parsed 1 request, average service time: 5 ms

3. Mixed status codes and methods

Input:
parse_logs('server.log')
Expected Output:
Parsed 4 requests, average service time: 52 ms
+ 2 hidden test cases