File Management Beyond Copy-Paste: Compressing, Splitting, and Why tar.gz Isn't Scary
The Problem: Moving and Managing Large Files
You need to transfer a project with 1,000 files to another server. Do you copy them one by one? That's slow and error-prone.
Or you have a 10GB log file. You can't email it. Can't open it easily. Need to break it into smaller chunks.
Or you download project.tar.gz and have no idea how to open it.
Linux has tools for all of this. Once you understand them, managing files becomes simple.
Comparing Files: diff and cmp
Before we dive into compression, let's cover comparing files.
diff: Line-by-Line Comparison
diff file1.txt file2.txt
Shows which lines are different:
2c2
< This is the old line
---
> This is the new line
Useful for:
- Comparing config files
- Checking what changed between versions
- Reviewing code changes
cmp: Byte-by-Byte Comparison
cmp file1.txt file2.txt
Stops at the first difference:
file1.txt file2.txt differ: byte 45, line 3
Useful for:
- Binary files
- Quick check if files are identical
- Finding corruption
If files are identical, no output is shown.
Understanding tar: The Tape Archive
tar (tape archive) bundles multiple files into one container. It doesn't compress—just packages.
Creating a tar Archive
tar cvf filename.tar /path/to/directory/
Breaking it down:
-
c- Create archive -
v- Verbose (show progress) -
f- File (specify filename)
Example:
# Archive entire project directory
tar cvf project.tar /home/user/project/
# Archive multiple directories
tar cvf backup.tar /etc/ /var/log/
Extracting a tar Archive
tar xvf filename.tar
-
x- Extract -
v- Verbose -
f- File
Example:
tar xvf project.tar
# Extracts all files to current directory
Important: tar Alone Doesn't Compress
# This creates 1000MB tar from 1000MB of files
tar cvf files.tar large_directory/
The archive is the same size as the original files. To actually compress, we need gzip.
gzip: Actual Compression
gzip compresses files to save space.
Compress a File
gzip filename.tar
# Creates: filename.tar.gz
# Original filename.tar is deleted
The One-Step tar.gz
Most commonly, you create and compress in one command:
tar cvzf filename.tar.gz /path/to/directory/
Added flag:
-
z- Compress with gzip
Example:
# Create compressed archive
tar cvzf project.tar.gz /home/user/project/
# Result: One compressed file containing everything
Decompress gzip Files
Two ways:
# Method 1: gzip -d
gzip -d filename.tar.gz
# Creates: filename.tar (decompressed)
# Method 2: gunzip (same thing)
gunzip filename.tar.gz
# Creates: filename.tar
Extract tar.gz in One Step
tar xvzf filename.tar.gz
-
x- Extract -
v- Verbose -
z- Decompress with gzip -
f- File
Example:
tar xvzf project.tar.gz
# Decompresses and extracts everything
Understanding tar.gz
When you see filename.tar.gz:
- It's a tar archive (many files bundled)
- Compressed with gzip (smaller size)
Think of it like a zip file in Windows.
Real-World Example: Backing Up a Website
# Create compressed backup
tar cvzf website-backup-2024-12-09.tar.gz /var/www/html/
# Check size
ls -lh website-backup-2024-12-09.tar.gz
# Output: -rw-r--r-- 1 user user 45M Dec 09 10:00 website-backup-2024-12-09.tar.gz
# Later, restore it
tar xvzf website-backup-2024-12-09.tar.gz
Truncate: Shrink or Extend File Size
truncate changes file size. Warning: This loses data.
Basic Usage
truncate -s 10 filename
Sets file size to 10 bytes. Everything beyond is deleted.
Common Uses
# Shrink file to 1MB
truncate -s 1M largefile.log
# Shrink file to 10 bytes
truncate -s 10 file.txt
# Empty a file (set to 0 bytes)
truncate -s 0 file.txt
Real Example: Clear Log File
# Log file is 5GB, causing disk issues
ls -lh /var/log/app.log
# -rw-r--r-- 1 root root 5.0G Dec 09 10:00 app.log
# Empty it
sudo truncate -s 0 /var/log/app.log
# Verify
ls -lh /var/log/app.log
# -rw-r--r-- 1 root root 0 Dec 09 10:01 app.log
Note: Truncate doesn't delete the file, just its contents.
Combining Files: cat
cat (concatenate) combines multiple files into one.
Basic Syntax
cat file1 file2 file3 > combined.txt
This appends file1, file2, and file3 into combined.txt.
Example: Combine Logs
# Multiple log files from different days
cat app-2024-12-01.log app-2024-12-02.log app-2024-12-03.log > december.log
# Verify
wc -l december.log
# Shows total lines from all three files
Example: Combine Parts
# Downloaded file in parts
cat download.part1 download.part2 download.part3 > complete-file.zip
Splitting Files: split
split breaks large files into smaller chunks.
Split by Lines
split -l 300 file.txt childfile
-
-l 300- Split every 300 lines -
file.txt- Input file -
childfile- Prefix for output files
Output:
childfileaa (first 300 lines)
childfileab (next 300 lines)
childfileac (next 300 lines)
...
Split by Size
# Split into 10MB chunks
split -b 10M largefile.log part_
# Output: part_aa, part_ab, part_ac, ...
Real Example: Split Large Database Dump
# Database dump is 5GB
ls -lh database.sql
# -rw-r--r-- 1 user user 5.0G Dec 09 10:00 database.sql
# Split into 100MB chunks for easier transfer
split -b 100M database.sql db_part_
# Results:
ls -lh db_part_*
# db_part_aa (100MB)
# db_part_ab (100MB)
# db_part_ac (100MB)
# ...
# Later, recombine on destination server
cat db_part_* > database.sql
Real-World Scenarios
Scenario 1: Backup and Transfer
# Create compressed backup
tar cvzf backup.tar.gz /home/user/documents/
# Transfer to remote server
scp backup.tar.gz user@remote-server:/backups/
# On remote server, extract
ssh user@remote-server
cd /backups
tar xvzf backup.tar.gz
Scenario 2: Send Large File via Email
Email has size limits. Split the file:
# Split into 10MB chunks
split -b 10M presentation.pdf part_
# Email part_aa, part_ab, part_ac separately
# Recipient combines them
cat part_* > presentation.pdf
Scenario 3: Archive Old Logs
# Logs taking up space
du -sh /var/log/app/
# 10G /var/log/app/
# Archive and compress old logs
tar cvzf old-logs-2024-11.tar.gz /var/log/app/2024-11-*.log
# Verify compression
ls -lh old-logs-2024-11.tar.gz
# -rw-r--r-- 1 user user 500M Dec 09 10:00 old-logs-2024-11.tar.gz
# (Compressed from 2GB to 500MB)
# Delete originals after verifying
rm /var/log/app/2024-11-*.log
Scenario 4: Clear Growing Log Files
# Check disk space
df -h
# Filesystem Size Used Avail Use% Mounted on
# /dev/sda1 50G 48G 0 98% /
# Find large log files
du -sh /var/log/* | sort -h | tail -5
# Truncate the largest
sudo truncate -s 0 /var/log/massive-app.log
# Check disk space again
df -h
# Filesystem Size Used Avail Use% Mounted on
# /dev/sda1 50G 20G 28G 42% /
Scenario 5: Deploy Application
# Developer creates release
tar cvzf myapp-v1.2.3.tar.gz /opt/myapp/
# Upload to server
scp myapp-v1.2.3.tar.gz user@prod-server:/tmp/
# On production server
ssh user@prod-server
cd /tmp
tar xvzf myapp-v1.2.3.tar.gz -C /opt/
# -C flag extracts to specific directory
Common tar Flags
Creating Archives
tar cvf file.tar dir/ # Create, verbose
tar czf file.tar.gz dir/ # Create, gzip
tar cjf file.tar.bz2 dir/ # Create, bzip2 (better compression)
Extracting Archives
tar xvf file.tar # Extract, verbose
tar xzf file.tar.gz # Extract, gzip
tar xjf file.tar.bz2 # Extract, bzip2
Viewing Contents
tar tvf file.tar # List contents without extracting
tar tzf file.tar.gz # List contents of gzipped archive
Extract to Specific Directory
tar xvzf file.tar.gz -C /destination/path/
Quick Reference
Comparison
diff file1 file2 # Line-by-line
cmp file1 file2 # Byte-by-byte
tar Operations
# Create
tar cvf file.tar dir/
tar cvzf file.tar.gz dir/ # With gzip
# Extract
tar xvf file.tar
tar xvzf file.tar.gz # With gzip
# View
tar tvf file.tar
Compression
# Compress
gzip file.tar # Creates file.tar.gz
tar cvzf file.tar.gz dir/ # Create and compress
# Decompress
gzip -d file.tar.gz # Creates file.tar
gunzip file.tar.gz # Same thing
tar xvzf file.tar.gz # Extract compressed
File Operations
# Truncate
truncate -s 10M file.log # Shrink to 10MB
truncate -s 0 file.log # Empty file
# Combine
cat file1 file2 > combined
# Split
split -l 300 file.txt prefix # By lines
split -b 10M file.log prefix # By size
Common Mistakes
Mistake #1: Forgetting the 'z' flag
# Wrong - creates tar but doesn't compress
tar cvf file.tar.gz dir/
# Right - actually compresses
tar cvzf file.tar.gz dir/
Mistake #2: Extracting in wrong directory
# Extracts to current directory (messy)
tar xvzf file.tar.gz
# Better - extract to specific location
tar xvzf file.tar.gz -C /target/directory/
Mistake #3: Truncating without backup
# Data is lost forever
truncate -s 0 important.log
# Better - backup first
cp important.log important.log.backup
truncate -s 0 important.log
Mistake #4: Wrong split prefix
# Splits to: childfileaa, childfileab
split -l 300 file.txt childfile
# If you wanted: part_aa, part_ab
split -l 300 file.txt part_
# Note the underscore
Tips for Efficiency
Tip 1: View before extracting
# Check what's inside first
tar tvzf file.tar.gz | less
# Then extract
tar xvzf file.tar.gz
Tip 2: Compress multiple directories
# Backup multiple locations at once
tar cvzf backup.tar.gz /etc/ /var/www/ /home/user/
Tip 3: Exclude files
# Archive but skip certain files
tar cvzf backup.tar.gz --exclude='*.log' --exclude='*.tmp' /home/user/
Tip 4: Use pbzip2 for faster compression
# Much faster on multi-core systems
tar cvf - dir/ | pbzip2 > file.tar.bz2
Key Takeaways
- tar bundles files - Doesn't compress by itself
- gzip compresses - Makes files smaller
- tar.gz = bundled + compressed - Like a zip file
-
Use 'z' flag -
tar cvzfandtar xvzf - truncate loses data - Use carefully
- cat combines files - Simple concatenation
- split breaks files - By lines or size
- Always test with 't' flag - View before extracting
Managing files isn't just about copy-paste. Compression, splitting, and archiving are essential skills for working with servers, backups, and large datasets.
What's your most common tar command? Share your file management workflows in the comments.
Top comments (0)