How I created my own image file format using only Python

#python #programming #opensource #showdev

Yes, it's true. I created an Image file format using solely Python. Struct is a library that comes with python.It was extensively used in this project for packing binary as well as unpacking it.
Firstly, what is an image file?
It has 2 primary components:

Header
Data The header holds the metadata of the image while the data has the pixel intensity values.In the context of headers, each format has a fixed layout of fields that never changes position. But these vary with formats. These "fields" are defined by a spec sheet of the file format. Here is mine:

AVJ File Format Specification

Field	Size (bytes)	Type	Description
Magic Number	4	ASCII (`4s`)	File signature, fixed to `"AVJ1"`
Version	1	Unsigned short (`H`)	File format version. Current = `1`
Image Width	4	Unsigned int (`I`)	Image width in pixels
Image Height	4	Unsigned int (`I`)	Image height in pixels
Color Mode	1	Unsigned byte (`B`)	Color representation. `3 = RGB` (only mode supported in v1)
Alt Text Length	4	Unsigned int (`I`)	Length of alt text (UTF-8 encoded) in bytes
Embedding 1 Length	4	Unsigned int (`I`)	Length of first embedding vector (bytes)
Embedding 2 Length	4	Unsigned int (`I`)	Length of second embedding vector (bytes)
Header Total	23	—	Fixed header size before variable sections
Alt Text	Variable	UTF-8 string	Textual description for accessibility / metadata
Embedding Vector 1	Variable	Raw bytes	Embedding of alt text
Embedding Vector 2	Variable	Raw bytes	Embedding of image pixel data
Image Pixel Data	Variable	Raw RGB bytes	Pixel matrix (`width × height × 3` bytes)

The file format is : '.avj'
It's an uncompressed file format encompassing vector embeddings of both the alt_text and the image data itself. Tailored for ML usecase!

In the spec sheet you need to include what type of info it is.
You can do that based on the following:

Commonly Used Struct Format Characters

Format Character	Python Type	Size (bytes)	Notes
`B`	int	1	Unsigned byte (0–255), great for flags like color mode
`H`	int	2	Unsigned short (0–65,535), good for version numbers
`I`	int	4	Unsigned int (0–4 billion), perfect for dimensions or lengths
`f`	float	4	32-bit float, useful for storing embedding values if needed
`d`	float	8	64-bit float, more precise embeddings
`s`	bytes	count given	Fixed-length string (e.g. `4s` for `"AVJ%"`)

Endianness / Byte Order Prefixes

Prefix	Meaning
`<`	Little-endian (recommended for simplicity)
`>`	Big-endian
`!`	Network order (big-endian)

Note: if you don't know what an endian is, just choose little endian.

Feel free to include a description for yourself. This makes things easier when coding. After you are done with that, let's get to coding:
So first, let's import the required libraries:

from fastapi import FastAPI, UploadFile, File
from fastapi.responses import StreamingResponse, JSONResponse
import struct
from PIL import Image
import io
import numpy as np
import torch
from transformers import CLIPProcessor, CLIPModel

I am creating this project in such a way that I can interact with it as an api. That's why I'm importing FASTAPI. Pillow for image processing, numpy for array handling and some other tasks, pytorch and transformers libraries for embedding generation.Since we want to use the CLIP model for embedding generation, we import that too.

app = FastAPI(title=".avj Encoder/Decoder with Embeddings")

# ------------------- AVJ Format -------------------
'<4s H I I B H B I I'
HEADER_SIZE = struct.calcsize(HEADER_FORMAT)

Here, we define the format of the header of our image file. This is done based on the table we discussed earlier. We also need to calculate the size of the header.
In addition to this I also define the FASTAPI app.

def image_to_bytes(image_file):
    img = Image.open(image_file).convert("RGB")
    return img.tobytes(), img.width, img.height, img.mode

Here we are defining a function that takes an image file path as argument.It then opens that image and convertstye I age to RGB format and uses the to_bytes method to convert the image into binary. It then returns the binaryand some image metadata.

def encode_headers_with_embeddings(raw_bytes, h, w, mode, alt_text, alt_emb, img_emb):
    alt_text_encoded = alt_text.encode("utf-8")
    len_alt_text_encoded = len(alt_text_encoded)

    mode_encoded = mode.encode("utf-8")
    len_mode_encoded = len(mode_encoded)

    alt_emb_bytes = np.array(alt_emb, dtype=np.float32).tobytes()
    img_emb_bytes = np.array(img_emb, dtype=np.float32).tobytes()

    header = struct.pack(
        HEADER_FORMAT,
        b'AVJ1',        # magic
        1,              # version
        int(h),
        int(w),
        3,              # channels RGB
        len_alt_text_encoded,
        len_mode_encoded,
        len(alt_emb_bytes),
        len(img_emb_bytes)
    )

    return header + alt_text_encoded + mode_encoded + alt_emb_bytes + img_emb_bytes + raw_bytes

Now this here is the encoder function that encodes the metadata and everything we need the header to have along with image binary.
The function accepts 7 arguments:

raw_bytes - image in bytes
h - image height
w - width of image
alt_text - well...alt text!
mode - image file format strong (eg. "RGBA", "RGB")
alt_emb - embeddings of alt text
img_emb - embeddings of the image itself (not including the headers)

So here we first encoder the alt-text into a standard format,ie; UTF-8. Then we convert the embeddings into bytes. We use the lack method in the struct class to "pack" all the info we want in the header. And that's the encoder!

def decode_headers_with_embeddings(encoded_bytes):
    header = encoded_bytes[:HEADER_SIZE]
    magic, version, height, width, channels, alt_text_len, mode_len, alt_emb_len, img_emb_len = struct.unpack(HEADER_FORMAT, header)

    start = HEADER_SIZE
    alt_text = encoded_bytes[start:start+alt_text_len].decode("utf-8")
    start += alt_text_len

    mode = encoded_bytes[start:start+mode_len].decode("utf-8")
    start += mode_len

    alt_emb_bytes = encoded_bytes[start:start+alt_emb_len]
    alt_embedding = np.frombuffer(alt_emb_bytes, dtype=np.float32)
    start += alt_emb_len

    img_emb_bytes = encoded_bytes[start:start+img_emb_len]
    image_embedding = np.frombuffer(img_emb_bytes, dtype=np.float32)
    start += img_emb_len

    image_bytes = encoded_bytes[start:]

    return {
        "magic": magic.decode("utf-8", errors="ignore"),
        "version": version,
        "height": height,
        "width": width,
        "channels": channels,
        "alt_text": alt_text,
        "mode": mode,
        "alt_embedding": alt_embedding.tolist(),
        "image_embedding": image_embedding.tolist(),
        "image_bytes": image_bytes
    }

def reconstruct_image(image_bytes, width, height, mode="RGB"):
    return Image.frombytes(mode, (width, height), image_bytes)

# ------------------- CLIP Embeddings -------------------
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
clip_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

This one does the opposite: it takes an .avj file and unpacks it. I extracts the header, grab the alt text, mode, and embeddings, and then return all of that in a Python dictionary.
The decoder works by first reading the fixed-size header from the file and unpacking it into fields like magic number, version, dimensions, and the lengths of the variable sections. Using those lengths, it moves through the file step by step: first extracting and decoding the alt text, then the image mode, followed by the two embeddings which are converted into NumPy arrays. Whatever remains after that is the raw pixel data. Finally, all of this information is returned neatly in a dictionary so the image and metadata can be reconstructed.

The restoration function simply takes the raw pixel bytes along with the image’s width, height, and mode, and feeds them into Pillow’s frombytes method. This rebuilds the original image exactly as it was, using the metadata we extracted from the file.
We use clip model to make embeddings too.
Then we just define the api endpoints and setup the flow. Voila! Our own image file!

DEV Community

How I created my own image file format using only Python

AVJ File Format Specification

Commonly Used Struct Format Characters

Endianness / Byte Order Prefixes

Top comments (0)