Disk Has Space But Can't Create Files? (Linux Inode Exhaustion)

#linux #devops #sysadmin #sre

One of the most confusing Linux errors I've debugged: a production server reporting "No space left on device" while df -h clearly showed 50GB free. I lost an hour to it the first time. Here's what was actually going on.

I turned scenarios like this into an interactive practice tool at scenar.site - you debug simulated servers by talking to an AI interviewer. More at the end.

The Setup

I was on-call for a logging pipeline. Rsyslog kept crashing, and the logs were full of this:

rsyslog[8421]: cannot create '/var/log/syslog.1': No space left on device
systemd[1]: rsyslog.service: Main process exited, code=exited, status=1/FAILURE

First instinct: the disk is full. Easy fix, right?

The Investigation

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       100G   45G   50G  48% /
/dev/sda2        20G  8.0G   11G  42% /var/log

Wait, what? 50GB free on root, 11GB free on /var/log. The disk isn't full. But the error clearly said "No space left on device". So what's going on?

This is the moment where a lot of people (including past me) start doing random things: restarting services, clearing caches, rebooting the machine. None of it works.

The Key Insight

A Linux filesystem tracks two resources, not one:

Disk space (what df -h shows) - how many bytes are used
Inodes - how many files can exist

Every file, directory, and symlink on the filesystem consumes exactly one inode. When you run out of inodes, you can't create new files even if you have terabytes of free space. The kernel returns ENOSPC which the userspace translates to "No space left on device" - the same error as being actually out of space. That's where the confusion comes from.

The Actual Diagnosis

$ df -i
Filesystem      Inodes    IUsed IFree IUse% Mounted on
/dev/sda1      6553600  6553598     2  100% /
/dev/sda2      1310720  1310718     2  100% /var/log

There it is. 100% inodes used. Zero free. The filesystem literally cannot create another file.

Now: where are all these inodes going? Inode exhaustion almost always means "a lot of small files in one place". Time to find them:

$ find /var/log -type f | wc -l
1310715

Over a million files in /var/log. That's the culprit. Let me see what they look like:

$ ls /var/log/ | head
session_000001.log
session_000002.log
session_000003.log
session_000004.log
session_000005.log
...

Session logs. Let me check the sizes:

$ find /var/log -name 'session_*' -printf '%s\n' | sort -u
0

Every single one is 0 bytes. Millions of empty files. Someone wrote a debug script, forgot to clean up, and it's been creating empty session logs for months. Each file is 0 bytes of disk space but consumes exactly one inode.

The Fix

Delete the empty files. Don't do it with rm directly - the argument list will be too long. Use find:

$ find /var/log -type f -name 'session_*' -delete

This took about 30 seconds on that machine. Then verify:

$ df -i
Filesystem      Inodes    IUsed  IFree IUse% Mounted on
/dev/sda1      6553600    2883 6550717    1% /
/dev/sda2      1310720    1003 1309717    1% /var/log

Restart the service:

$ systemctl restart rsyslog
$ systemctl status rsyslog
   Active: active (running)

Fixed.

Prevention

A few things I put in place after this:

Monitor inode usage, not just disk space. Most monitoring setups check df -h but forget df -i. Add an alert at 85% inode usage.
Set up logrotate for any directory that accumulates log files. The default logrotate config handles most system logs but custom paths need their own config.
Code review any script that creates files in production. The script that caused this was "just a debug helper" that was never removed.
Use find ... -delete for cleanup, not rm with glob patterns. Glob expansion will hit the ARG_MAX limit with millions of files.

What Interviewers Look For

If this comes up in an SRE interview, the interviewer isn't just checking if you know df -i. They want to see:

Do you check the actual error message carefully? ("No space left" has two possible causes)
Do you form a hypothesis before running commands? (Running df -h, df -i, find, each answering a specific question)
Can you explain the underlying concept? (inodes as a separate resource)
Do you think about prevention, not just the immediate fix?

Practice This Interactively

I built scenar.site to practice exactly these kinds of scenarios. You describe your debugging approach in plain English, an AI simulates a broken server and returns realistic command output, and tracks your reasoning. This scenario is one of 18 built-in ones. Free tier gets you started, no credit card.