Eugene

Posted on Apr 26

pg_ilib: Compact Typed Binary Serialization for PostgreSQL

#postgres #database #c #opensource

The problem

PostgreSQL's bytea type is powerful for storing raw binary data, but it carries no type information. Once you store a value as bytes, you need out-of-band metadata to know whether those bytes represent a number, a UUID, a timestamp, or a JSON object.

This gets painful when you're building dynamic schemas — EAV tables, schemaless document stores, or audit logs — where a single column holds values of different types. You end up carrying a separate type column everywhere, writing CASE expressions to decode it, and hoping they stay in sync.

pg_ilib solves this with a simple idea: prefix every serialized value with a 2-byte typed header.

The format

Byte 0: [ op_id (4 bits) | params_hi (4 bits) ]
Byte 1: [ params_lo (8 bits) ]
Bytes 2…N: payload

The op_id identifies the type. The params field carries type-specific metadata: decimal scale for numerics, timezone offset in minutes for timestamps.

op_id	Type	params
`0x01`	text	—
`0x02`	numeric / bigint	decimal scale
`0x03`	bool	—
`0x04`	timestamp / date	tz offset (signed minutes)
`0x08`	uuid	—
`0x0E`	jsonb	—
`0x0F`	hex bytes	—

Installation

# requires libgmp-dev (Debian/Ubuntu) or gmp-devel (RHEL/CentOS)
make && sudo make install

CREATE EXTENSION pg_ilib;

Works with PostgreSQL 11+ on any platform where pg_config is on PATH.

Basic usage

Each type has a symmetric pair of functions:

-- bigint
SELECT bytea_to_bigint(bigint_to_bytea(123456789));
-- 123456789

-- text
SELECT bytea_to_str(str_to_bytea('hello world'));
-- hello world

-- numeric with scale
SELECT bytea_to_numeric(numeric_to_bytea(3.14159, 5));
-- 3.14159

-- uuid
SELECT bytea_to_uuid(uuid_to_bytea('ac861c64-52ae-b223-4d6a-5c26fc34994c'));
-- ac861c64-52ae-b223-4d6a-5c26fc34994c

-- jsonb
SELECT bytea_to_jsonb(jsonb_to_bytea('{"name":"Alice","age":30}'::jsonb));
-- {"age": 30, "name": "Alice"}

Implicit CASTs are registered for most types:

SELECT 42::bigint::bytea;
SELECT '\x200000000000002a'::bytea::bigint;

The killer feature: value_to_jsonb

Because the type is embedded in the header, a single function can decode anything:

SELECT value_to_jsonb(bigint_to_bytea(42));        -- 42
SELECT value_to_jsonb(str_to_bytea('hello'));       -- "hello"
SELECT value_to_jsonb(bool_to_bytea(true));         -- true
SELECT value_to_jsonb(numeric_to_bytea(3.14, 2));   -- 3.14
SELECT value_to_jsonb(uuid_to_bytea('ac861c64-52ae-b223-4d6a-5c26fc34994c'));
-- "ac861c64-52ae-b223-4d6a-5c26fc34994c"

This makes EAV-style tables genuinely usable:

CREATE TABLE entity_attributes (
    entity_id uuid,
    key       text,
    value     bytea   -- holds any type, self-describing
);

-- Store mixed types in one column
INSERT INTO entity_attributes VALUES
    ('ac861c64-...', 'age',      bigint_to_bytea(30)),
    ('ac861c64-...', 'score',    numeric_to_bytea(9.75, 2)),
    ('ac861c64-...', 'active',   bool_to_bytea(true)),
    ('ac861c64-...', 'joined',   timestamp_to_bytea(1700000000)),
    ('ac861c64-...', 'tag',      str_to_bytea('premium'));

-- Decode everything to JSON in one query, no type column needed
SELECT key, value_to_jsonb(value)
FROM entity_attributes
WHERE entity_id = 'ac861c64-...';

    key    | value_to_jsonb
-----------+----------------
 age       | 30
 score     | 9.75
 active    | true
 joined    | "2023-11-14T22:13:20Z"
 tag       | "premium"

Timestamps and timezones

The params field stores the timezone offset in signed minutes, so the original offset survives the round-trip:

-- Store UTC
SELECT timestamp_to_bytea(1766323245);

-- Decode as plain timestamp (offset ignored)
SELECT bytea_to_timestamp(timestamp_to_bytea(1766323245));
-- 2026-01-21 00:00:45

-- Decode as timestamptz with UTC+2 offset baked in
SELECT bytea_to_timestamptz(timestamp_to_bytea(1766323245, 120));
-- 2026-01-21 02:00:45+02

Built-in corruption detection

Every decoder calls pg_ilib_check_header() before touching payload bytes. Impossible (op_id, params, payload_size) combinations raise ERRCODE_DATA_CORRUPTED instead of crashing the server:

-- Scale 4095 is impossible for a 1-byte payload (max = 3)
SELECT bytea_to_numeric('\x2FFF2a');
-- ERROR: pg_ilib bytea_to_numeric: numeric scale 4095 is impossible
--        for 1 payload byte(s) (max scale = 3)

-- Timezone offset out of IANA range [-840, 840] minutes
SELECT bytea_to_timestamp('\x4FFF12345678');
-- ERROR: pg_ilib bytea_to_timestamp[tz]: timezone offset 4095 min
--        is out of valid range [-840, 840]

-- Unknown op_id
SELECT value_to_jsonb('\x9000ff');
-- ERROR: pg_ilib value_to_jsonb: unknown op_id 0x09

Repository structure

The repo contains three independent extensions that share a single directory and build system:

Extension	Description	Build
`pg_ilib`	Typed bytea serialization (this article)	`make`

Build all three at once:

make all-ext && sudo make install-ext

Testing without installing

make testdb        # create pg_ilib_test database (once)
make quicktest     # compile and run test/quick_test.sql against a /tmp copy

# Override host/user if needed
make quicktest PG_HOST=10.0.0.1 PG_USER=myuser

Why we built this

pg_ilib started as an internal component of LedgyX — a low-code API platform that generates FastAPI applications directly from PostgreSQL schemas.

The core challenge in LedgyX is that every table, column, and type is defined dynamically at runtime. We needed a way to store values of any SQL type in a single bytea column and decode them correctly later — without carrying a separate type column everywhere. pg_ilib is the solution we built and have been running in production.

We decided to open-source it because the problem is general enough to be useful beyond LedgyX. If you're building EAV tables, audit logs, dynamic schemas, or any system where a column holds mixed types — this is for you.

Feedback and PRs welcome. If you're using this in production or have ideas for new op_id types, open an issue!

DEV Community