DEV Community

Cover image for Bag: A header-only, simplistic, C++20, zipping & unzipping library.
Techie Kho
Techie Kho

Posted on

Bag: A header-only, simplistic, C++20, zipping & unzipping library.

Hello! It has been a while since the last time I did something for entertainment and learning purposes. So I take up a fairly simple C++ project that hopefully could be useful for other fellow C++ devs.

Problem Statement

As I play around with raylib, the small, easy, and exceptionally cool game library, I met a small issue. There is no elegant way of packing files together. When I use a texture or audio, or any external resources, I either need to convert it into a C++ header that declares a byte array filled with that resource, or making sure the the user having the resource at the particular place. The first approach is no good when comes to loading new resource after compilation, while the latter approach is kind of awkward and too easy to break.

I'm quite fond of the embedded zip approach. Zip files having a quirky file structure in which the header and metadata is at the end of the file in contrast with other file structure, such as executables, which is at the beginning of the file. This quirk gives a possibility, which is appending the zip file at the end of the executable. This makes the file behave like an executable, and a zip file at the same time.

I believe there is definitely other zip library that did what I requested. But I would like to have some fun in C++20. So let's get started.

Overview of bag

With the aforementioned reasoning in mind, I wrote libbag, a header-only library for simple zipping and unzipping. It also includes the zipping (bag) and unzipping (unbag) utilities.

By itself, it is sort of like a serialiser and deserialiser for key-value pairs, with key being a simple C-style string and value as byte array. The bag file structure is fairly simple:

|------------------------|
| key 0 (C-style string) |
|------------------------|
| Content 0 (Byte array) |
|------------------------|
|       ......           |
|------------------------|
| key N (C-style string) |
|------------------------|
| Content N (Byte array) |
|------------------------|
| indices                |
|------------------------|
| metadata               |
|------------------------|
Enter fullscreen mode Exit fullscreen mode

So the key-value pairs are tightly packed together, and some data is appended at the end of the pack. Metadata specifies the whole bag's byte count (to ignore any excess byte in front), position and length of indices, and lastly some magic bytes to indicate it is a bag.

From the length and position of indices specified in metadata, we can query the memory to get the indices slice in bag. There it resides a list of position and length for each of the key-value pairs saved.

To get the keys, just treat each key-value pairs as a C-style string since there is a null-terminator that marks the end of the key string. To query the content, just skip 'till the first null-terminator (basically the whole key), and then you get the content begin position, with some pointer arithmetic using the position and length from the indices, the end position of the content can be calculated.

Here is some example usage.

#include<libbag.hpp>
#include<sstring>


int main() {
    std::map<libbag::key_type, libbag::content_type> input_map;
    /// Add items into `input_map`.
    std::basic_ostreamstring<libbag::unit_type> packed_stream; // Can be replace with std::basic_ofstream to output to file.
    libbag::pack(input_map, packed_stream); // Pack.

    libbag::unit_string_type packed = packed_stream.str();
    std::map<libbag::key_type, libbag::content_type> result;
    libbag::unpack_all(libbag::bag_type(packed.begin(), packed.end()), std::inserter(result, result.end())); // Unpack.
    /// Do something about the unpacked data.
}

Enter fullscreen mode Exit fullscreen mode

The API depends on iterators, stream and inserters to reduce allocations.

I also wrote a simple CLI to zip and unzip files, which is great for referencing as well:

Top comments (0)