saahityaedams

08 Jul 2025

Troubleshooting Linux Notes

One thing I’ve noticed over the last couple of years working is that folks generally struggle about how to approach troubleshooting issues on their linux boxes. Generally you have obvious signals about what went wrong, like for instance someone mentioning on slack that they are running some software with new config. Some times you need to gather signals with certain linux tools. This post describes certain tools (htop (mostly), df, du, journalctl) I use at a high level to do basic troubleshooting.

I first start by looking at running this great tool htop1. At a first glance I’m just looking at CPU utilization percents for the different cores, the Memory usage, the Swap usage. The good thing is that in htop, anomalies stick out painfully. You see all the cores at 100% utilization, you have too many processes consuming too much CPU. You see high Memory usage and Swap usage, you have multiple processes using too much RAM, the OS is swapping the memory to your hard disk and back.

I also look at Load Average in htop which indicates how many processes are running or waiting to run over the last 1 min, 5 mins, 15 mins. If you have 8 cores and have load average of 20 for 15 mins, it indicates that you have 20 processes running or waiting to run (terrible scenario, ideally this shouldn’t be more than 8).

Htop also lets you filter processes by (CPU, memory utilization, etc), filter by process name and kill processes which I find useful.

Sometimes software on linux boxes stop due to insufficient disk space. I find it useful to run df -h to confirm this by checking capacity of the primary volume. I then tend to go some directories where I think could be the culprits of this (like log or upload folders) and run du -h -d 1 . | sort -hr to further dig into which folder or file is taking too much space.

I use this less frequently, but I’ve run different versions of sudo journalctl (referencing chatgpt) to troubleshoot harder and more core issues on linux boxes.

TODO: Add details about dstat


  1. Good resource regd htop - https://peteris.rocks/blog/htop/ ↩︎