The problem
PostgreSQL's bytea type is powerful for storing raw binary data, but it carries no type information. Once you store a value as bytes, you need out-of-band metadata to know whether those bytes represent a number, a UUID, a timestamp, or a JSON object.
This gets painful when you're building dynamic schemas — EAV tables, schemaless document stores, or audit logs — where a single column holds values of different types. You end up carrying a separate type column everywhere, writing CASE expressions to decode it, and hoping they stay in sync.
pg_ilib solves this with a simple idea: prefix every serialized value with a 2-byte typed header.
The format
Byte 0: [ op_id (4 bits) | params_hi (4 bits) ]
Byte 1: [ params_lo (8 bits) ]
Bytes 2…N: payload
The op_id identifies the type. The params field carries type-specific metadata: decimal scale for numerics, timezone offset in minutes for timestamps.
| op_id | Type | params |
|---|---|---|
0x01 |
text | — |
0x02 |
numeric / bigint | decimal scale |
0x03 |
bool | — |
0x04 |
timestamp / date | tz offset (signed minutes) |
0x08 |
uuid | — |
0x0E |
jsonb | — |
0x0F |
hex bytes | — |
Installation
# requires libgmp-dev (Debian/Ubuntu) or gmp-devel (RHEL/CentOS)
make && sudo make install
CREATE EXTENSION pg_ilib;
Works with PostgreSQL 11+ on any platform where pg_config is on PATH.
Basic usage
Each type has a symmetric pair of functions:
-- bigint
SELECT bytea_to_bigint(bigint_to_bytea(123456789));
-- 123456789
-- text
SELECT bytea_to_str(str_to_bytea('hello world'));
-- hello world
-- numeric with scale
SELECT bytea_to_numeric(numeric_to_bytea(3.14159, 5));
-- 3.14159
-- uuid
SELECT bytea_to_uuid(uuid_to_bytea('ac861c64-52ae-b223-4d6a-5c26fc34994c'));
-- ac861c64-52ae-b223-4d6a-5c26fc34994c
-- jsonb
SELECT bytea_to_jsonb(jsonb_to_bytea('{"name":"Alice","age":30}'::jsonb));
-- {"age": 30, "name": "Alice"}
Implicit CASTs are registered for most types:
SELECT 42::bigint::bytea;
SELECT '\x200000000000002a'::bytea::bigint;
The killer feature: value_to_jsonb
Because the type is embedded in the header, a single function can decode anything:
SELECT value_to_jsonb(bigint_to_bytea(42)); -- 42
SELECT value_to_jsonb(str_to_bytea('hello')); -- "hello"
SELECT value_to_jsonb(bool_to_bytea(true)); -- true
SELECT value_to_jsonb(numeric_to_bytea(3.14, 2)); -- 3.14
SELECT value_to_jsonb(uuid_to_bytea('ac861c64-52ae-b223-4d6a-5c26fc34994c'));
-- "ac861c64-52ae-b223-4d6a-5c26fc34994c"
This makes EAV-style tables genuinely usable:
CREATE TABLE entity_attributes (
entity_id uuid,
key text,
value bytea -- holds any type, self-describing
);
-- Store mixed types in one column
INSERT INTO entity_attributes VALUES
('ac861c64-...', 'age', bigint_to_bytea(30)),
('ac861c64-...', 'score', numeric_to_bytea(9.75, 2)),
('ac861c64-...', 'active', bool_to_bytea(true)),
('ac861c64-...', 'joined', timestamp_to_bytea(1700000000)),
('ac861c64-...', 'tag', str_to_bytea('premium'));
-- Decode everything to JSON in one query, no type column needed
SELECT key, value_to_jsonb(value)
FROM entity_attributes
WHERE entity_id = 'ac861c64-...';
key | value_to_jsonb
-----------+----------------
age | 30
score | 9.75
active | true
joined | "2023-11-14T22:13:20Z"
tag | "premium"
Timestamps and timezones
The params field stores the timezone offset in signed minutes, so the original offset survives the round-trip:
-- Store UTC
SELECT timestamp_to_bytea(1766323245);
-- Decode as plain timestamp (offset ignored)
SELECT bytea_to_timestamp(timestamp_to_bytea(1766323245));
-- 2026-01-21 00:00:45
-- Decode as timestamptz with UTC+2 offset baked in
SELECT bytea_to_timestamptz(timestamp_to_bytea(1766323245, 120));
-- 2026-01-21 02:00:45+02
Built-in corruption detection
Every decoder calls pg_ilib_check_header() before touching payload bytes. Impossible (op_id, params, payload_size) combinations raise ERRCODE_DATA_CORRUPTED instead of crashing the server:
-- Scale 4095 is impossible for a 1-byte payload (max = 3)
SELECT bytea_to_numeric('\x2FFF2a');
-- ERROR: pg_ilib bytea_to_numeric: numeric scale 4095 is impossible
-- for 1 payload byte(s) (max scale = 3)
-- Timezone offset out of IANA range [-840, 840] minutes
SELECT bytea_to_timestamp('\x4FFF12345678');
-- ERROR: pg_ilib bytea_to_timestamp[tz]: timezone offset 4095 min
-- is out of valid range [-840, 840]
-- Unknown op_id
SELECT value_to_jsonb('\x9000ff');
-- ERROR: pg_ilib value_to_jsonb: unknown op_id 0x09
Repository structure
The repo contains three independent extensions that share a single directory and build system:
| Extension | Description | Build |
|---|---|---|
pg_ilib |
Typed bytea serialization (this article) | make |
Build all three at once:
make all-ext && sudo make install-ext
Testing without installing
make testdb # create pg_ilib_test database (once)
make quicktest # compile and run test/quick_test.sql against a /tmp copy
# Override host/user if needed
make quicktest PG_HOST=10.0.0.1 PG_USER=myuser
Links
- GitHub: github.com/ineron/pg_ilib.git
- License: Apache 2.0
- Dependencies: PostgreSQL 11+, libgmp
Why we built this
pg_ilib started as an internal component of LedgyX — a low-code API platform that generates FastAPI applications directly from PostgreSQL schemas.
The core challenge in LedgyX is that every table, column, and type is defined dynamically at runtime. We needed a way to store values of any SQL type in a single bytea column and decode them correctly later — without carrying a separate type column everywhere. pg_ilib is the solution we built and have been running in production.
We decided to open-source it because the problem is general enough to be useful beyond LedgyX. If you're building EAV tables, audit logs, dynamic schemas, or any system where a column holds mixed types — this is for you.
Feedback and PRs welcome. If you're using this in production or have ideas for new op_id types, open an issue!
Top comments (0)