Farhad Rahimi Klie

Posted on Mar 8

Building a File Compressor in C (From Scratch)

#c #filecompressor #systemprogramming

File compression is one of the classic problems in computer science and systems programming. Every operating system and many applications rely on compression algorithms to reduce disk usage, decrease network bandwidth, and speed up data transfer.

In this article, I will explain how I built a simple file compressor in C from scratch. The program reads a file, compresses its contents using Run-Length Encoding (RLE), and writes the compressed output into another file.

The goal of this project is not to compete with production compressors like gzip or zstd, but to understand how compression works internally.

Understanding the Compression Idea

Before writing code, we must understand the compression algorithm.

The algorithm used here is Run-Length Encoding (RLE).

The idea is very simple:

If a character repeats many times, we store:

character + count

Example:

Original text:

aaaaabbbbcc

Compressed form:

a5b4c2

Instead of storing the same character repeatedly, we store the character and how many times it appears consecutively.

This works best for files that contain many repeated characters.

Project Structure

To keep the code clean and maintainable, we split the program into multiple files.

file-compressor/
│
├── main.c
├── compressor.h
├── compressor.c
├── decompressor.c
├── decompressor.h
└── Makefile

This structure separates responsibilities:

main.c → program entry point
compressor.* → compression logic
decompressor.* → decompression logic

Header File (compressor.h)

#ifndef COMPRESSOR_H
#define COMPRESSOR_H

int compress_file(const char *input_file, const char *output_file);

#endif

This header declares the compression function so other files can use it.

Compression Implementation (compressor.c)

#include <stdio.h>
#include "compressor.h"

int compress_file(const char *input_file, const char *output_file)
{
    FILE *in = fopen(input_file, "rb");
    if (!in)
    {
        perror("Input file error");
        return 1;
    }

    FILE *out = fopen(output_file, "wb");
    if (!out)
    {
        perror("Output file error");
        fclose(in);
        return 1;
    }

    int prev = fgetc(in);
    if (prev == EOF)
    {
        fclose(in);
        fclose(out);
        return 0;
    }

    int count = 1;
    int curr;

    while ((curr = fgetc(in)) != EOF)
    {
        if (curr == prev)
        {
            count++;
        }
        else
        {
            fprintf(out, "%c%d", prev, count);
            prev = curr;
            count = 1;
        }
    }

    fprintf(out, "%c%d", prev, count);

    fclose(in);
    fclose(out);

    return 0;
}

This function reads the input file byte-by-byte and counts repeated characters.

Whenever a new character appears, it writes the previous character and its count to the output file.

Header File (decompressor.h)

#ifndef DECOMPRESSOR_H
#define DECOMPRESSOR_H

int decompress_file(const char *input_file, const char *output_file);

#endif

Decompression Implementation (decompressor.c)

#include <stdio.h>
#include <ctype.h>
#include "decompressor.h"

int decompress_file(const char *input_file, const char *output_file)
{
    FILE *in = fopen(input_file, "rb");
    if (!in)
    {
        perror("Input file error");
        return 1;
    }

    FILE *out = fopen(output_file, "wb");
    if (!out)
    {
        perror("Output file error");
        fclose(in);
        return 1;
    }

    int ch;
    while ((ch = fgetc(in)) != EOF)
    {
        int count = 0;
        int digit;

        while ((digit = fgetc(in)) != EOF && isdigit(digit))
        {
            count = count * 10 + (digit - '0');
        }

        for (int i = 0; i < count; i++)
        {
            fputc(ch, out);
        }

        if (digit != EOF)
            ungetc(digit, in);
    }

    fclose(in);
    fclose(out);

    return 0;
}

This function reverses the compression by reading:

character + number

and writing the character repeatedly.

Main Program (main.c)

#include <stdio.h>
#include <string.h>

#include "compressor.h"
#include "decompressor.h"

int main(int argc, char *argv[])
{
    if (argc != 4)
    {
        printf("Usage:\n");
        printf("compress:   %s -c input output\n", argv[0]);
        printf("decompress: %s -d input output\n", argv[0]);
        return 1;
    }

    if (strcmp(argv[1], "-c") == 0)
    {
        return compress_file(argv[2], argv[3]);
    }
    else if (strcmp(argv[1], "-d") == 0)
    {
        return decompress_file(argv[2], argv[3]);
    }

    printf("Unknown option\n");
    return 1;
}

This allows two modes:

compression
decompression

Makefile

CC=gcc
CFLAGS=-Wall -O2

OBJ=main.o compressor.o decompressor.o

all: compressor

compressor: $(OBJ)
    $(CC) $(OBJ) -o compressor

main.o: main.c compressor.h decompressor.h
    $(CC) $(CFLAGS) -c main.c

compressor.o: compressor.c compressor.h
    $(CC) $(CFLAGS) -c compressor.c

decompressor.o: decompressor.c decompressor.h
    $(CC) $(CFLAGS) -c decompressor.c

clean:
    rm -f *.o compressor

Compile the project using:

make

Running the Program

Compress a file:

./compressor -c input.txt compressed.txt

Decompress it:

./compressor -d compressed.txt output.txt

Possible Improvements

This project is intentionally simple, but real compressors use more advanced algorithms such as:

Huffman Coding
LZ77 / LZ78
LZW
Arithmetic Coding

You could improve this project by:

switching to binary encoding instead of text
adding bit-level compression
implementing Huffman trees
using buffered I/O for performance

Final Thoughts

Building a compressor from scratch is an excellent exercise for understanding:

file I/O
memory efficiency
algorithmic thinking
low-level data representation

Even though the algorithm used here is simple, the architecture we built (modular .h and .c files) mirrors how real systems software is designed.

If you're learning systems programming in C, this is a great foundational project.

DEV Community