Introduction
In this lab, we will explore the du
(disk usage) command in Linux, a powerful tool for estimating and analyzing disk space usage. Imagine you're a system administrator tasked with managing a rapidly growing file server. Your mission is to identify space-consuming directories and files, helping optimize storage utilization. The du
command will be your trusty detective tool in this disk space investigation.
Understanding the Basics of du
The du
command is your first line of defense in understanding disk space usage. Let's start by examining its basic functionality.
First, let's navigate to the project directory where we'll conduct our investigation:
cd ~/project
Now, let's run a basic du
command:
du
Tips: Files and folders are created randomly, and their sizes are also random, so the results may vary each time you run it.
You'll see output similar to this:
0 ./documents/reports
0 ./documents
10240 ./backups
0 ./logs/archive
0 ./logs/system
5120 ./logs/application
5120 ./logs
15360 .
Each line shows two pieces of information:
- The disk usage (in KB)
- The corresponding directory path
The numbers might seem cryptic at first. They represent the disk usage in kilobytes (KB). But don't worry, we can make them more readable!
Let's run the command with the -h
(human-readable) option:
du -h
Now you'll see output like this:
0 ./documents/reports
0 ./documents
10M ./backups
0 ./logs/archive
0 ./logs/system
5.0M ./logs/application
5.0M ./logs
15M .
The -h
option converts the sizes to a more human-friendly format (K for Kilobytes, M for Megabytes, etc.). This makes it much easier for us humans to understand at a glance.
A few things to note:
- The
.
at the end represents the current directory (~/project in this case). - The disk usage of a directory includes the usage of all its subdirectories.
- The sizes you see might be slightly different, as the setup script generates random file sizes.
Investigating Specific Directories
Now that we understand the basics, let's dive deeper into specific directories. We'll focus on the logs
directory, which seems to be using a significant amount of space.
First, let's change to the logs directory:
cd ~/project/logs
Now, let's use du
to examine this directory:
du -h
You might see output like this:
0 ./archive
0 ./system
5.0M ./application
5.0M .
This gives us a breakdown of the disk usage for each subdirectory within the logs directory. But what if we only want to see the total for the logs directory?
We can use the --max-depth
option to limit how deep du
looks into the directory structure:
du -h --max-depth=0
This will output only the total for the current directory:
5.0M .
The --max-depth=0
tells du
to only show the current directory, without going into any subdirectories.
To see just the immediate subdirectories, use --max-depth=1
:
du -h --max-depth=1
Output:
0 ./archive
0 ./system
5.0M ./application
5.0M .
This gives us a clearer picture of which subdirectories are using the most space.
The --max-depth
option is particularly useful when you're dealing with deeply nested directory structures and you want to focus on a specific level of the hierarchy.
Sorting and Analyzing Disk Usage
Now that we've identified the subdirectories using the most space, let's learn how to sort the results. This will help us quickly identify the largest consumers of disk space.
We'll use the sort
command in combination with du
. Don't worry if you're not familiar with sort
- we'll explain how it works.
First, let's sort the output of du
by size:
du -h | sort -h
This command does two things:
-
du -h
: Runs the disk usage command with human-readable output -
|
: This is a pipe. It takes the output of the command on the left and feeds it as input to the command on the right. -
sort -h
: Sorts the input numerically based on human-readable sizes
You might see output like this:
0 ./archive
0 ./system
5.0M .
5.0M ./application
The output is sorted from smallest to largest. But often, we're more interested in the largest directories first. To reverse the order, we can add the -r
option to sort
:
du -h | sort -hr
Output:
5.0M ./application
5.0M .
0 ./system
0 ./archive
Now we can clearly see which subdirectories within the logs folder are using the most space, in descending order.
To focus only on the immediate subdirectories and sort them, we can combine the techniques we've learned:
du -h --max-depth=1 | sort -hr
This command will show and sort only the immediate subdirectories of the current directory.
Remember, the power of the command line comes from combining simple commands to perform complex operations. We've just combined du
, sort
, and various options to quickly analyze disk usage!
Finding the Largest Files
So far, we've been looking at directory sizes. But what if we want to find the specific files that are taking up the most space? The du
command primarily works with directories, but we can combine it with other commands to find large files.
We'll use the find
command along with du
. Don't worry if you're not familiar with find
- we'll explain how it works.
First, let's navigate back to the project directory:
cd ~/project
Now, let's use find
and du
to locate the largest files:
find . -type f -exec du -h {} + | sort -hr | head -n 5
This command might look complex, but let's break it down:
-
find . -type f
: Finds all files (-type f
) in the current directory (.
) and its subdirectories -
-exec du -h {} +
: Executesdu -h
on each file found. The{}
is replaced with the filename, and the+
tells find to pass as many filenames as possible to each invocation ofdu
. -
sort -hr
: Sorts the results by size in reverse order (largest first) -
head -n 5
: Shows only the top 5 results
You might see output like this:
10M ./backups/large_backup.bak
5.0M ./logs/application/large_app_log.log
0 ./logs/system/placeholder.log
0 ./logs/archive/placeholder.log
0 ./logs/application/placeholder.log
This output shows us the five largest files in the project directory and their sizes.
To focus on files larger than a specific size, we can modify our command. Let's find files larger than 1MB:
find . -type f -size +1M -exec du -h {} + | sort -hr
This command adds -size +1M
to filter for files larger than 1 megabyte.
These commands are incredibly useful when you're trying to free up disk space. They allow you to quickly identify the largest files, which are often the best candidates for deletion or archiving.
Generating a Disk Usage Report
As the final step in our disk space investigation, let's create a comprehensive disk usage report for the entire project directory. This report will help us summarize our findings and present them to the team.
First, let's make sure we're in the project directory:
cd ~/project
Now, let's create a detailed report using du
and save it to a file:
du -h --max-depth=2 | sort -hr > disk_usage_report.txt
Let's break down this command:
-
du -h --max-depth=2
: Shows disk usage up to two levels deep in human-readable format -
sort -hr
: Sorts the results by size in reverse order (largest first) -
> disk_usage_report.txt
: Saves the output to a file named disk_usage_report.txt. The>
is called a redirection operator - it takes the output that would normally go to the screen and "redirects" it to a file instead.
Now that we've created our report, let's view its contents:
cat disk_usage_report.txt
You should see a comprehensive list of directories and their sizes, sorted from largest to smallest.
To get a summary of the largest directories, we can use the head
command to view just the top entries:
head -n 10 disk_usage_report.txt
This will show you the 10 largest directories in your project.
This report is a valuable tool for identifying which areas of your project are consuming the most disk space. It can help guide your efforts in optimizing storage usage or in discussions with your team about resource allocation.
Summary
In this lab, we've explored the powerful du
command and its applications in managing disk space. We've learned how to:
- Use basic
du
command to estimate disk usage - Make the output human-readable with the
-h
option - Investigate specific directories and limit depth with
--max-depth
- Sort and analyze disk usage results
- Find the largest files in a directory
- Generate comprehensive disk usage reports
These skills are essential for any system administrator or power user managing storage resources.
Additional du
options not covered in this lab include:
-
-s
: Display only a total for each argument -
-c
: Produce a grand total -
-a
: Show disk usage for files as well as directories -
--time
: Show the time of last modification for each directory -
--exclude=PATTERN
: Exclude files or directories matching PATTERN
π Practice Now: Linux du Command: File Space Estimating
Want to Learn More?
- π³ Learn the latest Linux Skill Trees
- π Read More Linux Tutorials
- π¬ Join our Discord or tweet us @WeAreLabEx
Top comments (0)