DEV Community

Cover image for File Management Beyond Copy-Paste: Compressing, Splitting, and Why tar.gz Isn't Scary
NJEI
NJEI

Posted on

File Management Beyond Copy-Paste: Compressing, Splitting, and Why tar.gz Isn't Scary

File Management Beyond Copy-Paste: Compressing, Splitting, and Why tar.gz Isn't Scary

The Problem: Moving and Managing Large Files

You need to transfer a project with 1,000 files to another server. Do you copy them one by one? That's slow and error-prone.

Or you have a 10GB log file. You can't email it. Can't open it easily. Need to break it into smaller chunks.

Or you download project.tar.gz and have no idea how to open it.

Linux has tools for all of this. Once you understand them, managing files becomes simple.

Comparing Files: diff and cmp

Before we dive into compression, let's cover comparing files.

diff: Line-by-Line Comparison

diff file1.txt file2.txt
Enter fullscreen mode Exit fullscreen mode

Shows which lines are different:

2c2
< This is the old line
---
> This is the new line
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Comparing config files
  • Checking what changed between versions
  • Reviewing code changes

cmp: Byte-by-Byte Comparison

cmp file1.txt file2.txt
Enter fullscreen mode Exit fullscreen mode

Stops at the first difference:

file1.txt file2.txt differ: byte 45, line 3
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Binary files
  • Quick check if files are identical
  • Finding corruption

If files are identical, no output is shown.

Understanding tar: The Tape Archive

tar (tape archive) bundles multiple files into one container. It doesn't compress—just packages.

Creating a tar Archive

tar cvf filename.tar /path/to/directory/
Enter fullscreen mode Exit fullscreen mode

Breaking it down:

  • c - Create archive
  • v - Verbose (show progress)
  • f - File (specify filename)

Example:

# Archive entire project directory
tar cvf project.tar /home/user/project/

# Archive multiple directories
tar cvf backup.tar /etc/ /var/log/
Enter fullscreen mode Exit fullscreen mode

Extracting a tar Archive

tar xvf filename.tar
Enter fullscreen mode Exit fullscreen mode
  • x - Extract
  • v - Verbose
  • f - File

Example:

tar xvf project.tar
# Extracts all files to current directory
Enter fullscreen mode Exit fullscreen mode

Important: tar Alone Doesn't Compress

# This creates 1000MB tar from 1000MB of files
tar cvf files.tar large_directory/
Enter fullscreen mode Exit fullscreen mode

The archive is the same size as the original files. To actually compress, we need gzip.

gzip: Actual Compression

gzip compresses files to save space.

Compress a File

gzip filename.tar
# Creates: filename.tar.gz
# Original filename.tar is deleted
Enter fullscreen mode Exit fullscreen mode

The One-Step tar.gz

Most commonly, you create and compress in one command:

tar cvzf filename.tar.gz /path/to/directory/
Enter fullscreen mode Exit fullscreen mode

Added flag:

  • z - Compress with gzip

Example:

# Create compressed archive
tar cvzf project.tar.gz /home/user/project/

# Result: One compressed file containing everything
Enter fullscreen mode Exit fullscreen mode

Decompress gzip Files

Two ways:

# Method 1: gzip -d
gzip -d filename.tar.gz
# Creates: filename.tar (decompressed)

# Method 2: gunzip (same thing)
gunzip filename.tar.gz
# Creates: filename.tar
Enter fullscreen mode Exit fullscreen mode

Extract tar.gz in One Step

tar xvzf filename.tar.gz
Enter fullscreen mode Exit fullscreen mode
  • x - Extract
  • v - Verbose
  • z - Decompress with gzip
  • f - File

Example:

tar xvzf project.tar.gz
# Decompresses and extracts everything
Enter fullscreen mode Exit fullscreen mode

Understanding tar.gz

When you see filename.tar.gz:

  1. It's a tar archive (many files bundled)
  2. Compressed with gzip (smaller size)

Think of it like a zip file in Windows.

Real-World Example: Backing Up a Website

# Create compressed backup
tar cvzf website-backup-2024-12-09.tar.gz /var/www/html/

# Check size
ls -lh website-backup-2024-12-09.tar.gz
# Output: -rw-r--r-- 1 user user 45M Dec 09 10:00 website-backup-2024-12-09.tar.gz

# Later, restore it
tar xvzf website-backup-2024-12-09.tar.gz
Enter fullscreen mode Exit fullscreen mode

Truncate: Shrink or Extend File Size

truncate changes file size. Warning: This loses data.

Basic Usage

truncate -s 10 filename
Enter fullscreen mode Exit fullscreen mode

Sets file size to 10 bytes. Everything beyond is deleted.

Common Uses

# Shrink file to 1MB
truncate -s 1M largefile.log

# Shrink file to 10 bytes
truncate -s 10 file.txt

# Empty a file (set to 0 bytes)
truncate -s 0 file.txt
Enter fullscreen mode Exit fullscreen mode

Real Example: Clear Log File

# Log file is 5GB, causing disk issues
ls -lh /var/log/app.log
# -rw-r--r-- 1 root root 5.0G Dec 09 10:00 app.log

# Empty it
sudo truncate -s 0 /var/log/app.log

# Verify
ls -lh /var/log/app.log
# -rw-r--r-- 1 root root 0 Dec 09 10:01 app.log
Enter fullscreen mode Exit fullscreen mode

Note: Truncate doesn't delete the file, just its contents.

Combining Files: cat

cat (concatenate) combines multiple files into one.

Basic Syntax

cat file1 file2 file3 > combined.txt
Enter fullscreen mode Exit fullscreen mode

This appends file1, file2, and file3 into combined.txt.

Example: Combine Logs

# Multiple log files from different days
cat app-2024-12-01.log app-2024-12-02.log app-2024-12-03.log > december.log

# Verify
wc -l december.log
# Shows total lines from all three files
Enter fullscreen mode Exit fullscreen mode

Example: Combine Parts

# Downloaded file in parts
cat download.part1 download.part2 download.part3 > complete-file.zip
Enter fullscreen mode Exit fullscreen mode

Splitting Files: split

split breaks large files into smaller chunks.

Split by Lines

split -l 300 file.txt childfile
Enter fullscreen mode Exit fullscreen mode
  • -l 300 - Split every 300 lines
  • file.txt - Input file
  • childfile - Prefix for output files

Output:

childfileaa  (first 300 lines)
childfileab  (next 300 lines)
childfileac  (next 300 lines)
...
Enter fullscreen mode Exit fullscreen mode

Split by Size

# Split into 10MB chunks
split -b 10M largefile.log part_

# Output: part_aa, part_ab, part_ac, ...
Enter fullscreen mode Exit fullscreen mode

Real Example: Split Large Database Dump

# Database dump is 5GB
ls -lh database.sql
# -rw-r--r-- 1 user user 5.0G Dec 09 10:00 database.sql

# Split into 100MB chunks for easier transfer
split -b 100M database.sql db_part_

# Results:
ls -lh db_part_*
# db_part_aa (100MB)
# db_part_ab (100MB)
# db_part_ac (100MB)
# ...

# Later, recombine on destination server
cat db_part_* > database.sql
Enter fullscreen mode Exit fullscreen mode

Real-World Scenarios

Scenario 1: Backup and Transfer

# Create compressed backup
tar cvzf backup.tar.gz /home/user/documents/

# Transfer to remote server
scp backup.tar.gz user@remote-server:/backups/

# On remote server, extract
ssh user@remote-server
cd /backups
tar xvzf backup.tar.gz
Enter fullscreen mode Exit fullscreen mode

Scenario 2: Send Large File via Email

Email has size limits. Split the file:

# Split into 10MB chunks
split -b 10M presentation.pdf part_

# Email part_aa, part_ab, part_ac separately

# Recipient combines them
cat part_* > presentation.pdf
Enter fullscreen mode Exit fullscreen mode

Scenario 3: Archive Old Logs

# Logs taking up space
du -sh /var/log/app/
# 10G    /var/log/app/

# Archive and compress old logs
tar cvzf old-logs-2024-11.tar.gz /var/log/app/2024-11-*.log

# Verify compression
ls -lh old-logs-2024-11.tar.gz
# -rw-r--r-- 1 user user 500M Dec 09 10:00 old-logs-2024-11.tar.gz
# (Compressed from 2GB to 500MB)

# Delete originals after verifying
rm /var/log/app/2024-11-*.log
Enter fullscreen mode Exit fullscreen mode

Scenario 4: Clear Growing Log Files

# Check disk space
df -h
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/sda1        50G   48G     0  98% /

# Find large log files
du -sh /var/log/* | sort -h | tail -5

# Truncate the largest
sudo truncate -s 0 /var/log/massive-app.log

# Check disk space again
df -h
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/sda1        50G   20G   28G  42% /
Enter fullscreen mode Exit fullscreen mode

Scenario 5: Deploy Application

# Developer creates release
tar cvzf myapp-v1.2.3.tar.gz /opt/myapp/

# Upload to server
scp myapp-v1.2.3.tar.gz user@prod-server:/tmp/

# On production server
ssh user@prod-server
cd /tmp
tar xvzf myapp-v1.2.3.tar.gz -C /opt/
# -C flag extracts to specific directory
Enter fullscreen mode Exit fullscreen mode

Common tar Flags

Creating Archives

tar cvf file.tar dir/       # Create, verbose
tar czf file.tar.gz dir/    # Create, gzip
tar cjf file.tar.bz2 dir/   # Create, bzip2 (better compression)
Enter fullscreen mode Exit fullscreen mode

Extracting Archives

tar xvf file.tar            # Extract, verbose
tar xzf file.tar.gz         # Extract, gzip
tar xjf file.tar.bz2        # Extract, bzip2
Enter fullscreen mode Exit fullscreen mode

Viewing Contents

tar tvf file.tar            # List contents without extracting
tar tzf file.tar.gz         # List contents of gzipped archive
Enter fullscreen mode Exit fullscreen mode

Extract to Specific Directory

tar xvzf file.tar.gz -C /destination/path/
Enter fullscreen mode Exit fullscreen mode

Quick Reference

Comparison

diff file1 file2            # Line-by-line
cmp file1 file2             # Byte-by-byte
Enter fullscreen mode Exit fullscreen mode

tar Operations

# Create
tar cvf file.tar dir/
tar cvzf file.tar.gz dir/   # With gzip

# Extract
tar xvf file.tar
tar xvzf file.tar.gz        # With gzip

# View
tar tvf file.tar
Enter fullscreen mode Exit fullscreen mode

Compression

# Compress
gzip file.tar               # Creates file.tar.gz
tar cvzf file.tar.gz dir/   # Create and compress

# Decompress
gzip -d file.tar.gz         # Creates file.tar
gunzip file.tar.gz          # Same thing
tar xvzf file.tar.gz        # Extract compressed
Enter fullscreen mode Exit fullscreen mode

File Operations

# Truncate
truncate -s 10M file.log    # Shrink to 10MB
truncate -s 0 file.log      # Empty file

# Combine
cat file1 file2 > combined

# Split
split -l 300 file.txt prefix    # By lines
split -b 10M file.log prefix    # By size
Enter fullscreen mode Exit fullscreen mode

Common Mistakes

Mistake #1: Forgetting the 'z' flag

# Wrong - creates tar but doesn't compress
tar cvf file.tar.gz dir/

# Right - actually compresses
tar cvzf file.tar.gz dir/
Enter fullscreen mode Exit fullscreen mode

Mistake #2: Extracting in wrong directory

# Extracts to current directory (messy)
tar xvzf file.tar.gz

# Better - extract to specific location
tar xvzf file.tar.gz -C /target/directory/
Enter fullscreen mode Exit fullscreen mode

Mistake #3: Truncating without backup

# Data is lost forever
truncate -s 0 important.log

# Better - backup first
cp important.log important.log.backup
truncate -s 0 important.log
Enter fullscreen mode Exit fullscreen mode

Mistake #4: Wrong split prefix

# Splits to: childfileaa, childfileab
split -l 300 file.txt childfile

# If you wanted: part_aa, part_ab
split -l 300 file.txt part_
# Note the underscore
Enter fullscreen mode Exit fullscreen mode

Tips for Efficiency

Tip 1: View before extracting

# Check what's inside first
tar tvzf file.tar.gz | less

# Then extract
tar xvzf file.tar.gz
Enter fullscreen mode Exit fullscreen mode

Tip 2: Compress multiple directories

# Backup multiple locations at once
tar cvzf backup.tar.gz /etc/ /var/www/ /home/user/
Enter fullscreen mode Exit fullscreen mode

Tip 3: Exclude files

# Archive but skip certain files
tar cvzf backup.tar.gz --exclude='*.log' --exclude='*.tmp' /home/user/
Enter fullscreen mode Exit fullscreen mode

Tip 4: Use pbzip2 for faster compression

# Much faster on multi-core systems
tar cvf - dir/ | pbzip2 > file.tar.bz2
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. tar bundles files - Doesn't compress by itself
  2. gzip compresses - Makes files smaller
  3. tar.gz = bundled + compressed - Like a zip file
  4. Use 'z' flag - tar cvzf and tar xvzf
  5. truncate loses data - Use carefully
  6. cat combines files - Simple concatenation
  7. split breaks files - By lines or size
  8. Always test with 't' flag - View before extracting

Managing files isn't just about copy-paste. Compression, splitting, and archiving are essential skills for working with servers, backups, and large datasets.


What's your most common tar command? Share your file management workflows in the comments.

Top comments (0)