Learn Understand first, then practice while the concept is still fresh.

M55 - CLI Intensive 2: Text Processing Mastery

Build fast, reliable pipeline habits for filtering, extracting, and counting text without losing track of what each stage does.

CLI Intensive & Capstone

CLI Intensive 2: Text Processing Mastery

Build fast, reliable pipeline habits for filtering, extracting, and counting text without losing track of what each stage does.

35 min ADVANCED BOTH Curriculum-reviewed
What you should be able to do after this
  • Build fast, reliable pipeline habits for filtering, extracting, and counting text without losing track of what each stage does.

Fast Pipelines Still Need To Be Explainable

This section is about working faster with text, but not by turning pipelines into mystery spells.

Good text fluency means you can:

  • filter the relevant lines
  • extract the field you need
  • count or group the result
  • explain every stage afterward

If you cannot explain the pipeline, it is not mastered yet.


Setup: A Small Log File

Create a file named access.log with sample data you can reuse across the drills:

72.14.201.33 - - [24/Oct/2024:10:00:00] "GET /index.html HTTP/1.1" 200 4500
192.168.1.5 - - [24/Oct/2024:10:05:01] "GET /admin.php HTTP/1.1" 403 200
88.22.11.90 - - [24/Oct/2024:10:06:12] "POST /login HTTP/1.1" 401 50
72.14.201.33 - - [24/Oct/2024:10:06:15] "POST /login HTTP/1.1" 401 50
10.0.0.9 - - [24/Oct/2024:10:07:00] "GET /style.css HTTP/1.1" 200 800
72.14.201.33 - - [24/Oct/2024:10:08:22] "POST /login HTTP/1.1" 401 50

Drill A: Filter and Count

Objective: Show only the failed login attempts and count them.

  1. open the file
  2. keep only the 401 lines
  3. count them with a command, not by eye
One possible Windows solution

Get-Content access.log | Select-String “401” | Measure-Object -Line

One possible Linux solution

grep “401” access.log | wc -l


Drill B: Extract the Attacker Addresses

Objective: Keep only the IP addresses from the failed login lines and save them.

  1. filter for 401
  2. extract the first field
  3. write the result to threats.txt
One possible Windows solution

Get-Content access.log | Select-String “401” | ForEach-Object { ($_ -split ’ ’)[0] } | Out-File threats.txt

One possible Linux solution

grep “401” access.log | awk '{print $1}' > threats.txt


Drill C: Count Unique Sources

Objective: Count how many times each source address appears.

  1. read threats.txt
  2. group repeated values
  3. display counts
One possible Windows solution

Get-Content threats.txt | Group-Object | Select-Object Count, Name

One possible Linux solution

sort threats.txt | uniq -c

Build Pipelines in Stages First

When learning, it is better to run the filter stage, then the extraction stage, then the counting stage. Once you trust each part, you can combine them into a shorter one-liner.


Mastery Check

You are ready to move on when you can:

  • build the pipeline without guesswork
  • explain each stage clearly
  • spot when a wrong field or wrong filter would change the answer

The next section uses this same clarity for one-line operational commands.