M53 - Troubleshooting Lab: 5 Break Scenarios
Troubleshooting Lab: 5 Break Scenarios
Apply structured troubleshooting to realistic OS failures by choosing the first checks, narrowing scope, and proposing a safe next action.
- Apply structured troubleshooting to realistic OS failures by choosing the first checks, narrowing scope, and proposing a safe next action.
The Point of This Lab
This lab is not about memorizing one magic command per scenario.
It is about practicing three habits:
- choose the first checks well
- narrow the problem before changing things
- make the next action safe and explainable
For each case, focus on:
- what the symptom really tells you
- what scope you should test first
- which command or observation gives the most useful evidence next
Scenario 1: Name Fails, Network Might Not
Problem: Users cannot reach wiki.corporate.local.
Observed symptom: Browser reports a name-resolution error.
Good first questions:
- is general network connectivity working?
- is only the name lookup failing?
Strong first checks:
ping 8.8.8.8nslookup wiki.corporate.localordig wiki.corporate.local
Likely lesson: If raw connectivity works but name lookup fails, the problem is probably DNS rather than the target server itself.
Scenario 2: Connection Refused on One Port
Problem: A database host is reachable, but connections to port 3306 are refused.
Good first questions:
- is the service listening?
- is the service running?
- is the refusal local to the service or caused by a filter in front of it?
Strong first checks:
ss -tulpn | grep 3306on Linux or a Windows equivalent for listening portssystemctl status mysqlorGet-Service- service logs if the process is stopped or failing
Likely lesson: A refused port often points toward a stopped service or an application not listening, not always a general network outage.
Scenario 3: System Feels Slow, Not Dead
Problem: A web server responds, but it is extremely slow.
Good first questions:
- is the bottleneck CPU, memory, or disk?
- is a background job competing with the main workload?
Strong first checks:
toporhtop- disk or I/O observation tools
- recent backup, compression, or maintenance activity
Likely lesson: Slow response often needs resource inspection before service restarts or config changes.
Scenario 4: The Fix Did Not Survive Reboot
Problem: A change seemed to work yesterday, but after reboot the system is broken again.
Good first questions:
- was the fix made in a persistent location?
- does startup overwrite or regenerate that state?
- is the service reading the same config you changed?
Strong first checks:
- confirm the service is running
- inspect the expected configuration file
- check whether automation, container recreation, or policy management rewrote the change
Likely lesson: Some fixes disappear because they were made in the wrong layer or the wrong place, not because the change itself was bad.
Scenario 5: “No Space Left” Even Though Space Exists
Problem: An application cannot write temporary files, but df -h still shows free space.
Good first questions:
- is the block space full?
- is the filesystem out of inodes instead?
Strong first checks:
df -hdf -i
Likely lesson: Storage problems are not only about gigabytes. Filesystem metadata limits matter too.
How To Use This Lab Well
For each scenario, write down:
- the first two checks you would run
- what each result would help you confirm or exclude
- the safest next action after those checks
If you can explain your sequence clearly, you are building real troubleshooting ability.
What You Just Practiced
- distinguishing symptoms from causes
- selecting high-value first checks
- reducing scope before making changes
- proposing a next action that tests a hypothesis instead of guessing
This is the right mindset to carry into the intensive CLI and capstone sections that follow.