DEV Community

Cover image for Building a File Compressor in C (From Scratch)
Farhad Rahimi Klie
Farhad Rahimi Klie

Posted on

Building a File Compressor in C (From Scratch)

File compression is one of the classic problems in computer science and systems programming. Every operating system and many applications rely on compression algorithms to reduce disk usage, decrease network bandwidth, and speed up data transfer.

In this article, I will explain how I built a simple file compressor in C from scratch. The program reads a file, compresses its contents using Run-Length Encoding (RLE), and writes the compressed output into another file.

The goal of this project is not to compete with production compressors like gzip or zstd, but to understand how compression works internally.


Understanding the Compression Idea

Before writing code, we must understand the compression algorithm.

The algorithm used here is Run-Length Encoding (RLE).

The idea is very simple:

If a character repeats many times, we store:

character + count
Enter fullscreen mode Exit fullscreen mode

Example:

Original text:

aaaaabbbbcc
Enter fullscreen mode Exit fullscreen mode

Compressed form:

a5b4c2
Enter fullscreen mode Exit fullscreen mode

Instead of storing the same character repeatedly, we store the character and how many times it appears consecutively.

This works best for files that contain many repeated characters.


Project Structure

To keep the code clean and maintainable, we split the program into multiple files.

file-compressor/
│
├── main.c
├── compressor.h
├── compressor.c
├── decompressor.c
├── decompressor.h
└── Makefile
Enter fullscreen mode Exit fullscreen mode

This structure separates responsibilities:

  • main.c → program entry point
  • compressor.* → compression logic
  • decompressor.* → decompression logic

Header File (compressor.h)

#ifndef COMPRESSOR_H
#define COMPRESSOR_H

int compress_file(const char *input_file, const char *output_file);

#endif
Enter fullscreen mode Exit fullscreen mode

This header declares the compression function so other files can use it.


Compression Implementation (compressor.c)

#include <stdio.h>
#include "compressor.h"

int compress_file(const char *input_file, const char *output_file)
{
    FILE *in = fopen(input_file, "rb");
    if (!in)
    {
        perror("Input file error");
        return 1;
    }

    FILE *out = fopen(output_file, "wb");
    if (!out)
    {
        perror("Output file error");
        fclose(in);
        return 1;
    }

    int prev = fgetc(in);
    if (prev == EOF)
    {
        fclose(in);
        fclose(out);
        return 0;
    }

    int count = 1;
    int curr;

    while ((curr = fgetc(in)) != EOF)
    {
        if (curr == prev)
        {
            count++;
        }
        else
        {
            fprintf(out, "%c%d", prev, count);
            prev = curr;
            count = 1;
        }
    }

    fprintf(out, "%c%d", prev, count);

    fclose(in);
    fclose(out);

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

This function reads the input file byte-by-byte and counts repeated characters.

Whenever a new character appears, it writes the previous character and its count to the output file.


Header File (decompressor.h)

#ifndef DECOMPRESSOR_H
#define DECOMPRESSOR_H

int decompress_file(const char *input_file, const char *output_file);

#endif
Enter fullscreen mode Exit fullscreen mode

Decompression Implementation (decompressor.c)

#include <stdio.h>
#include <ctype.h>
#include "decompressor.h"

int decompress_file(const char *input_file, const char *output_file)
{
    FILE *in = fopen(input_file, "rb");
    if (!in)
    {
        perror("Input file error");
        return 1;
    }

    FILE *out = fopen(output_file, "wb");
    if (!out)
    {
        perror("Output file error");
        fclose(in);
        return 1;
    }

    int ch;
    while ((ch = fgetc(in)) != EOF)
    {
        int count = 0;
        int digit;

        while ((digit = fgetc(in)) != EOF && isdigit(digit))
        {
            count = count * 10 + (digit - '0');
        }

        for (int i = 0; i < count; i++)
        {
            fputc(ch, out);
        }

        if (digit != EOF)
            ungetc(digit, in);
    }

    fclose(in);
    fclose(out);

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

This function reverses the compression by reading:

character + number
Enter fullscreen mode Exit fullscreen mode

and writing the character repeatedly.


Main Program (main.c)

#include <stdio.h>
#include <string.h>

#include "compressor.h"
#include "decompressor.h"

int main(int argc, char *argv[])
{
    if (argc != 4)
    {
        printf("Usage:\n");
        printf("compress:   %s -c input output\n", argv[0]);
        printf("decompress: %s -d input output\n", argv[0]);
        return 1;
    }

    if (strcmp(argv[1], "-c") == 0)
    {
        return compress_file(argv[2], argv[3]);
    }
    else if (strcmp(argv[1], "-d") == 0)
    {
        return decompress_file(argv[2], argv[3]);
    }

    printf("Unknown option\n");
    return 1;
}
Enter fullscreen mode Exit fullscreen mode

This allows two modes:

compression
decompression
Enter fullscreen mode Exit fullscreen mode

Makefile

CC=gcc
CFLAGS=-Wall -O2

OBJ=main.o compressor.o decompressor.o

all: compressor

compressor: $(OBJ)
    $(CC) $(OBJ) -o compressor

main.o: main.c compressor.h decompressor.h
    $(CC) $(CFLAGS) -c main.c

compressor.o: compressor.c compressor.h
    $(CC) $(CFLAGS) -c compressor.c

decompressor.o: decompressor.c decompressor.h
    $(CC) $(CFLAGS) -c decompressor.c

clean:
    rm -f *.o compressor
Enter fullscreen mode Exit fullscreen mode

Compile the project using:

make
Enter fullscreen mode Exit fullscreen mode

Running the Program

Compress a file:

./compressor -c input.txt compressed.txt
Enter fullscreen mode Exit fullscreen mode

Decompress it:

./compressor -d compressed.txt output.txt
Enter fullscreen mode Exit fullscreen mode

Possible Improvements

This project is intentionally simple, but real compressors use more advanced algorithms such as:

  • Huffman Coding
  • LZ77 / LZ78
  • LZW
  • Arithmetic Coding

You could improve this project by:

  • switching to binary encoding instead of text
  • adding bit-level compression
  • implementing Huffman trees
  • using buffered I/O for performance

Final Thoughts

Building a compressor from scratch is an excellent exercise for understanding:

  • file I/O
  • memory efficiency
  • algorithmic thinking
  • low-level data representation

Even though the algorithm used here is simple, the architecture we built (modular .h and .c files) mirrors how real systems software is designed.

If you're learning systems programming in C, this is a great foundational project.

Top comments (0)