M55 - CLI Intensive 2: Text Processing Mastery
CLI Intensive 2: Text Processing Mastery
Build fast, reliable pipeline habits for filtering, extracting, and counting text without losing track of what each stage does.
- Build fast, reliable pipeline habits for filtering, extracting, and counting text without losing track of what each stage does.
Fast Pipelines Still Need To Be Explainable
This section is about working faster with text, but not by turning pipelines into mystery spells.
Good text fluency means you can:
- filter the relevant lines
- extract the field you need
- count or group the result
- explain every stage afterward
If you cannot explain the pipeline, it is not mastered yet.
Setup: A Small Log File
Create a file named access.log with sample data you can reuse across the drills:
72.14.201.33 - - [24/Oct/2024:10:00:00] "GET /index.html HTTP/1.1" 200 4500
192.168.1.5 - - [24/Oct/2024:10:05:01] "GET /admin.php HTTP/1.1" 403 200
88.22.11.90 - - [24/Oct/2024:10:06:12] "POST /login HTTP/1.1" 401 50
72.14.201.33 - - [24/Oct/2024:10:06:15] "POST /login HTTP/1.1" 401 50
10.0.0.9 - - [24/Oct/2024:10:07:00] "GET /style.css HTTP/1.1" 200 800
72.14.201.33 - - [24/Oct/2024:10:08:22] "POST /login HTTP/1.1" 401 50
Drill A: Filter and Count
Objective: Show only the failed login attempts and count them.
- open the file
- keep only the
401lines - count them with a command, not by eye
Get-Content access.log | Select-String “401” | Measure-Object -Line
grep “401” access.log | wc -l
Drill B: Extract the Attacker Addresses
Objective: Keep only the IP addresses from the failed login lines and save them.
- filter for
401 - extract the first field
- write the result to
threats.txt
Get-Content access.log | Select-String “401” | ForEach-Object { ($_ -split ’ ’)[0] } | Out-File threats.txt
grep “401” access.log | awk '{print $1}' > threats.txt
Drill C: Count Unique Sources
Objective: Count how many times each source address appears.
- read
threats.txt - group repeated values
- display counts
Get-Content threats.txt | Group-Object | Select-Object Count, Name
sort threats.txt | uniq -c
Build Pipelines in Stages First
When learning, it is better to run the filter stage, then the extraction stage, then the counting stage. Once you trust each part, you can combine them into a shorter one-liner.
Mastery Check
You are ready to move on when you can:
- build the pipeline without guesswork
- explain each stage clearly
- spot when a wrong field or wrong filter would change the answer
The next section uses this same clarity for one-line operational commands.