DEV Community

Cover image for Bag: A header-only, simplistic, C++20, zipping & unzipping library.
Techie Kho
Techie Kho

Posted on

Bag: A header-only, simplistic, C++20, zipping & unzipping library.

Hello! It has been a while since the last time I did something for entertainment and learning purposes. So I take up a fairly simple C++ project that hopefully could be useful for other fellow C++ devs.

Problem Statement

As I play around with raylib, the small, easy, and exceptionally cool game library, I met a small issue. There is no elegant way of packing files together. When I use a texture or audio, or any external resources, I either need to convert it into a C++ header that declares a byte array filled with that resource, or making sure the the user having the resource at the particular place. The first approach is no good when comes to loading new resource after compilation, while the latter approach is kind of awkward and too easy to break.

I'm quite fond of the embedded zip approach. Zip files having a quirky file structure in which the header and metadata is at the end of the file in contrast with other file structure, such as executables, which is at the beginning of the file. This quirk gives a possibility, which is appending the zip file at the end of the executable. This makes the file behave like an executable, and a zip file at the same time.

I believe there is definitely other zip library that did what I requested. But I would like to have some fun in C++20. So let's get started.

Overview of bag

With the aforementioned reasoning in mind, I wrote libbag, a header-only library for simple zipping and unzipping. It also includes the zipping (bag) and unzipping (unbag) utilities.

By itself, it is sort of like a serialiser and deserialiser for key-value pairs, with key being a simple C-style string and value as byte array. The bag file structure is fairly simple:

|------------------------|
| key 0 (C-style string) |
|------------------------|
| Content 0 (Byte array) |
|------------------------|
|       ......           |
|------------------------|
| key N (C-style string) |
|------------------------|
| Content N (Byte array) |
|------------------------|
| indices                |
|------------------------|
| metadata               |
|------------------------|
Enter fullscreen mode Exit fullscreen mode

So the key-value pairs are tightly packed together, and some data is appended at the end of the pack. Metadata specifies the whole bag's byte count (to ignore any excess byte in front), position and length of indices, and lastly some magic bytes to indicate it is a bag.

From the length and position of indices specified in metadata, we can query the memory to get the indices slice in bag. There it resides a list of position and length for each of the key-value pairs saved.

To get the keys, just treat each key-value pairs as a C-style string since there is a null-terminator that marks the end of the key string. To query the content, just skip 'till the first null-terminator (basically the whole key), and then you get the content begin position, with some pointer arithmetic using the position and length from the indices, the end position of the content can be calculated.

Here is some example usage.

#include<libbag.hpp>
#include<sstring>


int main() {
    std::map<libbag::key_type, libbag::content_type> input_map;
    /// Add items into `input_map`.
    std::basic_ostreamstring<libbag::unit_type> packed_stream; // Can be replace with std::basic_ofstream to output to file.
    libbag::pack(input_map, packed_stream); // Pack.

    libbag::unit_string_type packed = packed_stream.str();
    std::map<libbag::key_type, libbag::content_type> result;
    libbag::unpack_all(libbag::bag_type(packed.begin(), packed.end()), std::inserter(result, result.end())); // Unpack.
    /// Do something about the unpacked data.
}

Enter fullscreen mode Exit fullscreen mode

The API depends on iterators, stream and inserters to reduce allocations.

I also wrote a simple CLI to zip and unzip files, which is great for referencing as well:

Image of AssemblyAI tool

Transforming Interviews into Publishable Stories with AssemblyAI

Insightview is a modern web application that streamlines the interview workflow for journalists. By leveraging AssemblyAI's LeMUR and Universal-2 technology, it transforms raw interview recordings into structured, actionable content, dramatically reducing the time from recording to publication.

Key Features:
πŸŽ₯ Audio/video file upload with real-time preview
πŸ—£οΈ Advanced transcription with speaker identification
⭐ Automatic highlight extraction of key moments
✍️ AI-powered article draft generation
πŸ“€ Export interview's subtitles in VTT format

Read full post

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

πŸ‘‹ Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay