File compression is one of the classic problems in computer science and systems programming. Every operating system and many applications rely on compression algorithms to reduce disk usage, decrease network bandwidth, and speed up data transfer.
In this article, I will explain how I built a simple file compressor in C from scratch. The program reads a file, compresses its contents using Run-Length Encoding (RLE), and writes the compressed output into another file.
The goal of this project is not to compete with production compressors like gzip or zstd, but to understand how compression works internally.
Understanding the Compression Idea
Before writing code, we must understand the compression algorithm.
The algorithm used here is Run-Length Encoding (RLE).
The idea is very simple:
If a character repeats many times, we store:
character + count
Example:
Original text:
aaaaabbbbcc
Compressed form:
a5b4c2
Instead of storing the same character repeatedly, we store the character and how many times it appears consecutively.
This works best for files that contain many repeated characters.
Project Structure
To keep the code clean and maintainable, we split the program into multiple files.
file-compressor/
│
├── main.c
├── compressor.h
├── compressor.c
├── decompressor.c
├── decompressor.h
└── Makefile
This structure separates responsibilities:
-
main.c→ program entry point -
compressor.*→ compression logic -
decompressor.*→ decompression logic
Header File (compressor.h)
#ifndef COMPRESSOR_H
#define COMPRESSOR_H
int compress_file(const char *input_file, const char *output_file);
#endif
This header declares the compression function so other files can use it.
Compression Implementation (compressor.c)
#include <stdio.h>
#include "compressor.h"
int compress_file(const char *input_file, const char *output_file)
{
FILE *in = fopen(input_file, "rb");
if (!in)
{
perror("Input file error");
return 1;
}
FILE *out = fopen(output_file, "wb");
if (!out)
{
perror("Output file error");
fclose(in);
return 1;
}
int prev = fgetc(in);
if (prev == EOF)
{
fclose(in);
fclose(out);
return 0;
}
int count = 1;
int curr;
while ((curr = fgetc(in)) != EOF)
{
if (curr == prev)
{
count++;
}
else
{
fprintf(out, "%c%d", prev, count);
prev = curr;
count = 1;
}
}
fprintf(out, "%c%d", prev, count);
fclose(in);
fclose(out);
return 0;
}
This function reads the input file byte-by-byte and counts repeated characters.
Whenever a new character appears, it writes the previous character and its count to the output file.
Header File (decompressor.h)
#ifndef DECOMPRESSOR_H
#define DECOMPRESSOR_H
int decompress_file(const char *input_file, const char *output_file);
#endif
Decompression Implementation (decompressor.c)
#include <stdio.h>
#include <ctype.h>
#include "decompressor.h"
int decompress_file(const char *input_file, const char *output_file)
{
FILE *in = fopen(input_file, "rb");
if (!in)
{
perror("Input file error");
return 1;
}
FILE *out = fopen(output_file, "wb");
if (!out)
{
perror("Output file error");
fclose(in);
return 1;
}
int ch;
while ((ch = fgetc(in)) != EOF)
{
int count = 0;
int digit;
while ((digit = fgetc(in)) != EOF && isdigit(digit))
{
count = count * 10 + (digit - '0');
}
for (int i = 0; i < count; i++)
{
fputc(ch, out);
}
if (digit != EOF)
ungetc(digit, in);
}
fclose(in);
fclose(out);
return 0;
}
This function reverses the compression by reading:
character + number
and writing the character repeatedly.
Main Program (main.c)
#include <stdio.h>
#include <string.h>
#include "compressor.h"
#include "decompressor.h"
int main(int argc, char *argv[])
{
if (argc != 4)
{
printf("Usage:\n");
printf("compress: %s -c input output\n", argv[0]);
printf("decompress: %s -d input output\n", argv[0]);
return 1;
}
if (strcmp(argv[1], "-c") == 0)
{
return compress_file(argv[2], argv[3]);
}
else if (strcmp(argv[1], "-d") == 0)
{
return decompress_file(argv[2], argv[3]);
}
printf("Unknown option\n");
return 1;
}
This allows two modes:
compression
decompression
Makefile
CC=gcc
CFLAGS=-Wall -O2
OBJ=main.o compressor.o decompressor.o
all: compressor
compressor: $(OBJ)
$(CC) $(OBJ) -o compressor
main.o: main.c compressor.h decompressor.h
$(CC) $(CFLAGS) -c main.c
compressor.o: compressor.c compressor.h
$(CC) $(CFLAGS) -c compressor.c
decompressor.o: decompressor.c decompressor.h
$(CC) $(CFLAGS) -c decompressor.c
clean:
rm -f *.o compressor
Compile the project using:
make
Running the Program
Compress a file:
./compressor -c input.txt compressed.txt
Decompress it:
./compressor -d compressed.txt output.txt
Possible Improvements
This project is intentionally simple, but real compressors use more advanced algorithms such as:
- Huffman Coding
- LZ77 / LZ78
- LZW
- Arithmetic Coding
You could improve this project by:
- switching to binary encoding instead of text
- adding bit-level compression
- implementing Huffman trees
- using buffered I/O for performance
Final Thoughts
Building a compressor from scratch is an excellent exercise for understanding:
- file I/O
- memory efficiency
- algorithmic thinking
- low-level data representation
Even though the algorithm used here is simple, the architecture we built (modular .h and .c files) mirrors how real systems software is designed.
If you're learning systems programming in C, this is a great foundational project.
Top comments (0)