Practice Use drills for recall and labs for real operating judgment.

LAB-HW-02 - Kernel Rings & dmesg

Read the Kernel Ring Buffer using dmesg to diagnose hardware failures, driver issues, and critical system boot errors.

HW System Lifecycle & Hardware

Kernel Rings & dmesg

Read the Kernel Ring Buffer using dmesg to diagnose hardware failures, driver issues, and critical system boot errors.

25 min ADVANCED LINUX Curriculum-reviewed
Success criteria
  • Read the Kernel Ring Buffer using dmesg to diagnose hardware failures, driver issues, and critical system boot errors.
  • Repeat the workflow without copy-paste or step-by-step prompting.
Safety notes
  • Use read-only discovery commands unless the lab explicitly tells you to change hardware or firmware state.

Part A: The Field Guide


🎯 What & Why

When an application like Apache crashes, it politely leaves a log file in /var/log/apache2/error.log. But what if the physical Hard Drive itself starts to die? The hard drive cannot write a log file claiming it is broken, because it’s broken!

When a hardware component throws sparks, or a low-level driver violently crashes into RAM, the Linux Kernel catches the explosion. The Kernel immediately prints a frantic warning directly into a special, high-speed chunk of memory called the Ring Buffer.

To read the Kernel’s frantic warning messages, we use the dmesg (Diagnostic Message) tool.


🧠 Mental Model: The Ring Buffer

Why is it called a “Ring”?

Imagine a circular conveyor belt that can hold exactly 1,000 text messages. When the Kernel boots, it places Message #1 on the belt. Then Message #2. If the system has been running for 5 years, the Kernel places Message #1001 on the belt. Because the belt is full, Message #1001 physically pushes Message #1 off the end of the belt into oblivion.

The buffer loops around infinitely, ensuring it never fills up your RAM, but always retaining the most recent hardware events.


📖 Command Reference

Reading the Buffer

Basic dmesg

$ # Dump the ENTIRE contents of the ring buffer to the screen at once $ sudo dmesg

$ # This is usually thousands of lines. Pipe it into ‘less’ so you can scroll! $ sudo dmesg | less

Note: On modern secured systems, standard users cannot read dmesg because it might leak sensitive memory addresses. You must use sudo.

Formatting the Output

By default, dmesg outputs a terrifying timestamp that looks like [14253.151240]. This is the number of seconds since the computer booted. Almost impossible for a human to read.

Human Readable Logging

$ # Convert the boot-seconds into actual Human Timestamps (-T) $ sudo dmesg -T | less

$ # Filter the noise: Only show me warnings (-l warn) and errors (-l err)! $ sudo dmesg -T -l err,warn

Tailing Live Events

If you plug a USB device into the server, the kernel will instantly scream a flurry of driver messages into the ring buffer. You want to watch this happen live.

Live Following

$ # Follow (-w) the kernel ring buffer in real-time, waiting for new events $ sudo dmesg -T -w


🌍 Real Scenarios

Scenario 1: The Out-Of-Memory Assassin (OOM Killer) Your database crashes randomly at 2:00 AM. There are no application logs. Nothing. It just vanished. You run sudo dmesg -T | grep -i "killed". You see a terrifying red kernel message: Out of memory: Killed process 3512 (postgres). Aha! The Kernel realized the RAM was 100% full. To save the operating system from a fatal crash, the Kernel deployed an assassin script called the “OOM Killer” to instantly murder the process using the most RAM. The database didn’t crash; the Kernel assassinated it.

Scenario 2: The Dying Hard Drive A server feels incredibly sluggish. Basic ls commands take 5 seconds. You run sudo dmesg -T -l err. You see hundreds of lines screaming: blk_update_request: I/O error, dev sda, sector 125192. The physical spinning magnetic disk is grinding to death. The Kernel is frantically warning you that it is repeatedly failing to perform Input/Output (I/O) ops. You immediately begin an emergency backup.


⚠️ Gotchas & Pitfalls

  1. Wait, where do the old messages go? Because the Ring Buffer is a circle, messages from 3 weeks ago are permanently deleted from RAM. However, the system silently writes backups of the buffer to the hard drive inside /var/log/syslog or /var/log/kern.log. If you need ancient hardware history, search those files, not dmesg.
  2. Buffer Clears on Reboot The Ring Buffer lives only in volatile RAM. If the server loses power and reboots, dmesg starts completely fresh at [0.0000]. If you want to know why a server spontaneously rebooted, dmesg won’t help you (the panic happened before the reboot). You must look in /var/log/.

Part B: The Drill Deck

Terminal Required: Open a Linux terminal to inspect the kernel’s secret diary. You will need sudo privileges.


G
Guided Step by step - type exactly this and compare the result
>

Exercise G1: The Data Hose

  1. Let’s dump the raw buffer: sudo dmesg
  2. Hundreds or thousands of lines will fly past. You are looking at the exact boot sequence of the Kernel initializing the hardware.
  3. Identify the timestamp format on the far left: [ 1.458923]. It is seconds-since-boot.
  4. Run the command again, but pipe it into a pager so we can read it: sudo dmesg | less
  5. Press Space to page down. Press G (capital G) to jump to the very bottom to see the most recent events. Press q to quit.

Exercise G2: Human Timestamps

  1. Run the command using the Human Readable Time (-T) flag.
  2. sudo dmesg -T | less
  3. Notice the timestamp is now [Mon Jan 15 14:02:11 2024]. Much better. (Press q to quit).

Exercise G3: The Surgeon’s Grep

Even with -T, there is too much noise. The Kernel logs every tiny success. Let’s filter for failures.

  1. Tell dmesg to only show you explicit errors natively using Level filtering (-l): sudo dmesg -T -l err
  2. If your list is empty, congratulations! Your hardware is flawless.
  3. If you have errors, read them carefully. You might see warnings about missing firmware or ACPI BIOS bugs (which are common and usually harmless).
S
Solo Task described, hints available - figure it out
>

Exercise S1: Hunting the Assassin

Let’s check if the OOM (Out Of Memory) killer has ever been invoked on this system.

  1. Pipe the dmesg output into grep.
  2. Do a case-insensitive search (-i) for the word killed. sudo dmesg -T | grep -i "killed"
  3. If it returns nothing, your server has never suffered memory starvation!

Exercise S2: Live Monitoring

If you are using a bare-metal machine (a physical laptop or desktop) right now, try this. (If you are in a remote VM, you can just practice the command).

  1. Tell dmesg to Follow (-w) the log, waiting for new events indefinitely.
  2. sudo dmesg -T -w
  3. While it is waiting, take a physical USB Flash Drive and plug it into the computer.
  4. Watch the terminal! The kernel will instantly print 10 lines of text recognizing the USB insertion, loading the usb-storage driver, and assigning it a drive letter like sdb.
  5. Pull the USB drive out. Watch the kernel instantly print the USB disconnect event!
  6. Press Ctrl + C to stop following.
M
Mission Real scenario - no hints, combine multiple skills
>

Mission M1: Boot Sequence Analysis

When a Linux system boots, it must first detect the physical Network Interface Cards (NICs) before it can assign IP addresses to them. The driver handling the ethernet link state is classically named e1000 or igb, but the generic term involves eth or the modern enp.

Your Mission: Prove exactly what time your Kernel activated the network hardware during boot.

  1. Formulate a human-readable dmesg command.
  2. Pipe it into grep.
  3. Search for the interface name of your main network connection (You found this in lab-net-01, but for this test, just grep for link is UP or promiscuous).
  4. Find the exact kernel timestamp when the networking laser physically turned on.