One of the most confusing Linux errors I've debugged: a production server reporting "No space left on device" while df -h clearly showed 50GB free. I lost an hour to it the first time. Here's what was actually going on.
I turned scenarios like this into an interactive practice tool at scenar.site - you debug simulated servers by talking to an AI interviewer. More at the end.
The Setup
I was on-call for a logging pipeline. Rsyslog kept crashing, and the logs were full of this:
rsyslog[8421]: cannot create '/var/log/syslog.1': No space left on device
systemd[1]: rsyslog.service: Main process exited, code=exited, status=1/FAILURE
First instinct: the disk is full. Easy fix, right?
The Investigation
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 100G 45G 50G 48% /
/dev/sda2 20G 8.0G 11G 42% /var/log
Wait, what? 50GB free on root, 11GB free on /var/log. The disk isn't full. But the error clearly said "No space left on device". So what's going on?
This is the moment where a lot of people (including past me) start doing random things: restarting services, clearing caches, rebooting the machine. None of it works.
The Key Insight
A Linux filesystem tracks two resources, not one:
-
Disk space (what
df -hshows) - how many bytes are used - Inodes - how many files can exist
Every file, directory, and symlink on the filesystem consumes exactly one inode. When you run out of inodes, you can't create new files even if you have terabytes of free space. The kernel returns ENOSPC which the userspace translates to "No space left on device" - the same error as being actually out of space. That's where the confusion comes from.
The Actual Diagnosis
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 6553600 6553598 2 100% /
/dev/sda2 1310720 1310718 2 100% /var/log
There it is. 100% inodes used. Zero free. The filesystem literally cannot create another file.
Now: where are all these inodes going? Inode exhaustion almost always means "a lot of small files in one place". Time to find them:
$ find /var/log -type f | wc -l
1310715
Over a million files in /var/log. That's the culprit. Let me see what they look like:
$ ls /var/log/ | head
session_000001.log
session_000002.log
session_000003.log
session_000004.log
session_000005.log
...
Session logs. Let me check the sizes:
$ find /var/log -name 'session_*' -printf '%s\n' | sort -u
0
Every single one is 0 bytes. Millions of empty files. Someone wrote a debug script, forgot to clean up, and it's been creating empty session logs for months. Each file is 0 bytes of disk space but consumes exactly one inode.
The Fix
Delete the empty files. Don't do it with rm directly - the argument list will be too long. Use find:
$ find /var/log -type f -name 'session_*' -delete
This took about 30 seconds on that machine. Then verify:
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 6553600 2883 6550717 1% /
/dev/sda2 1310720 1003 1309717 1% /var/log
Restart the service:
$ systemctl restart rsyslog
$ systemctl status rsyslog
Active: active (running)
Fixed.
Prevention
A few things I put in place after this:
-
Monitor inode usage, not just disk space. Most monitoring setups check
df -hbut forgetdf -i. Add an alert at 85% inode usage. - Set up logrotate for any directory that accumulates log files. The default logrotate config handles most system logs but custom paths need their own config.
- Code review any script that creates files in production. The script that caused this was "just a debug helper" that was never removed.
-
Use
find ... -deletefor cleanup, notrmwith glob patterns. Glob expansion will hit the ARG_MAX limit with millions of files.
What Interviewers Look For
If this comes up in an SRE interview, the interviewer isn't just checking if you know df -i. They want to see:
- Do you check the actual error message carefully? ("No space left" has two possible causes)
- Do you form a hypothesis before running commands? (Running
df -h,df -i,find, each answering a specific question) - Can you explain the underlying concept? (inodes as a separate resource)
- Do you think about prevention, not just the immediate fix?
Practice This Interactively
I built scenar.site to practice exactly these kinds of scenarios. You describe your debugging approach in plain English, an AI simulates a broken server and returns realistic command output, and tracks your reasoning. This scenario is one of 18 built-in ones. Free tier gets you started, no credit card.
Top comments (0)