DEV Community: Eugene

🚀 I Built UUIDs That Look Random But Sort Like Timestamps (50% Smaller Indexes!)

Eugene — Thu, 28 May 2026 10:22:24 +0000

TL;DR

Built a PostgreSQL extension that creates UUIDs looking like random v4 but containing hidden timestamps. Result: 50% smaller indexes with perfect privacy protection.

Available now on PGXN: pgxn install pg_uuid_v8

The Problem Every Backend Dev Faces 🤔

You know this pain:

UUID v4: Random, secure, but terrible for DB indexing (fragmentation nightmare)
UUID v7: Great for indexing, but reveals creation timing (privacy issues)

For years, we've been stuck choosing between performance and privacy.

The Solution: Steganographic UUIDs 🔮

What if UUIDs could be both random AND fast? Enter steganography - hiding encrypted timestamps inside random-looking UUIDs.

-- Looks like normal UUID v4
SELECT uuid_v8_generate();
-- bf3fcf45-9476-4138-bf48-03933d90dc2d

-- But contains hidden timestamp!
SELECT uuid_stego_extract_timestamp('bf3fcf45-9476-4138-bf48-03933d90dc2d');
-- 1714127712849302 (microseconds since epoch)

Technical Deep Dive 🛠️

UUID Structure

Standard v4:    xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx  
Steganographic: TTTTTTTT-TTTT-4RRR-yRRR-RRRRRRRRRRRR

T = Encrypted timestamp bits (48 bits)
R = Random bits  
4 = Version marker (v4 compliant)

Encryption Algorithm

// XOR encryption with SHA-256 key derivation
uint64 encrypt_timestamp(uint64 timestamp, const char* seed) {
    uint64 key = sha256_derive_key(seed);
    return (timestamp ^ key) & 0xFFFFFFFFFFFFULL;
}

PostgreSQL C Extension Implementation

PG_FUNCTION_INFO_V1(uuid_v8_generate);
Datum uuid_v8_generate(PG_FUNCTION_ARGS) {
    pg_uuid_t *uuid = palloc(sizeof(pg_uuid_t));

    uint64 timestamp = get_current_timestamp_us();
    uint64 key = generate_key_from_seed(stego_seed);
    uint64 encrypted = crypt_timestamp(timestamp, key);

    // Embed in first 48 bits + random fill
    embed_in_uuid(uuid, encrypted);
    set_uuid_v4_bits(uuid);

    PG_RETURN_UUID_P(uuid);
}

Performance Results 📈

Tested on 500k records:

Index Type	Size	Performance
UUID v4 Full	3.1 MB	Sequential Scan
Stego Functional	1.5 MB	Index Scan

-- The magic: functional indexing
CREATE INDEX events_time_idx ON events 
USING btree (uuid_stego_extract_timestamp(id));

-- Fast time-based queries
EXPLAIN ANALYZE SELECT * FROM events 
WHERE uuid_stego_extract_timestamp(id) BETWEEN start_ts AND end_ts;
-- Index Scan using events_time_idx (fast!)

Getting Started (Now Available on PGXN!) 📦

Easy Installation

# Install from PGXN (recommended)
pgxn install pg_uuid_v8

# Or build from source
git clone https://github.com/ineron/pg_uuid_v8
cd pg_uuid_v8
make && sudo make install

Setup

CREATE EXTENSION pg_uuid_v8;
SELECT uuid_v8_set_seed('your_secret_2024');

CREATE TABLE events (
    id uuid PRIMARY KEY DEFAULT uuid_v8_generate(),
    data jsonb
);

CREATE INDEX ON events (uuid_stego_extract_timestamp(id));

Queries That Actually Work

-- Efficient pagination
SELECT * FROM events 
WHERE uuid_stego_extract_timestamp(id) > last_timestamp
ORDER BY uuid_stego_extract_timestamp(id) LIMIT 100;

-- Time range analytics  
SELECT count(*) FROM events 
WHERE uuid_stego_in_range(id, '2024-01-01', '2024-12-31');

The Secret Sauce: Functional Indexes 🔑

PostgreSQL's functional indexes are the key. Instead of indexing the full 16-byte UUID, we index the extracted 8-byte timestamp:

-- Traditional approach (large index)
CREATE INDEX ON table (uuid_column);  -- 16 bytes per entry

-- Steganographic approach (compact index)  
CREATE INDEX ON table (uuid_stego_extract_timestamp(uuid_column));  -- 8 bytes per entry

Result: 50% space savings + Index Scan performance!

Security & Encryption Modes 🔒

The extension supports multiple encryption modes:

-- Fast privacy protection (default)
SELECT uuid_v8_set_encryption_mode('XOR');

-- GDPR/compliance ready
SELECT uuid_v8_set_encryption_mode('AES128');

-- Maximum security
SELECT uuid_v8_set_encryption_mode('AES256');

Threat Model

✅ Prevents: Timing analysis attacks, creation pattern discovery

✅ Protects: Privacy-sensitive applications (healthcare, finance)

⚠️ Note: XOR mode provides privacy, AES modes provide cryptographic security

Real-World Use Cases 🎯

Perfect For:

High-volume APIs (millions of records daily)
Privacy-sensitive systems (healthcare, finance)
Multi-tenant platforms (SaaS applications)
Time-series data with privacy requirements

Migration Example:

-- Existing UUID v4 tables
ALTER TABLE existing_table 
ADD COLUMN new_id uuid DEFAULT uuid_v8_generate();

-- Create functional index
CREATE INDEX ON existing_table (uuid_stego_extract_timestamp(new_id));

-- Gradually migrate queries

Performance Deep Dive 📊

Index Size Analysis

-- Compare index sizes
SELECT 
    indexname,
    pg_size_pretty(pg_total_relation_size(indexname::regclass)) as size
FROM pg_indexes 
WHERE tablename = 'test_table';

--           indexname           |  size  
-- ------------------------------+--------
--  test_table_uuid_idx          | 3.1 MB
--  test_table_stego_func_idx    | 1.5 MB

Query Performance

-- Query plan comparison
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM large_table 
WHERE uuid_stego_extract_timestamp(id) BETWEEN start_ts AND end_ts;

-- Result: Index Scan using large_table_stego_time_idx
-- Buffers: shared hit=3, Execution Time: 0.040 ms

Lessons Learned 📚

What Worked:

Functional indexes are PostgreSQL's superpower
Standard compliance (UUID v4) ensures easy adoption
PGXN distribution makes installation trivial
Multiple encryption modes satisfy different security requirements

Challenges Overcome:

LLVM bitcode compilation issues (solved with NO_LLVM=1)
Memory management in PostgreSQL extensions (palloc vs malloc)
PGXN metadata spec compliance (v1.0.0 validation)

Architecture Decisions 🏗️

Why C Extension vs PL/pgSQL?

Performance: Native speed for cryptographic operations
Integration: Deep PostgreSQL type system integration
Security: Compiled code vs interpreted SQL

Why Functional Indexes vs Custom Index Types?

Compatibility: Works with all PostgreSQL versions
Maintenance: Leverages existing B-tree infrastructure
Flexibility: Standard PostgreSQL query optimization

Community & Open Source 🌟

License: PostgreSQL License (permissive)
PGXN: https://pgxn.org/dist/pg_uuid_v8/
GitHub: Full source + comprehensive docs
Tests: Regression suite across PostgreSQL versions

Try It Yourself! 🚀

# Quick start
pgxn install pg_uuid_v8
psql -c "CREATE EXTENSION pg_uuid_v8; SELECT uuid_v8_generate();"

Benchmark Your Use Case

-- Create test table
CREATE TABLE benchmark (
    id uuid DEFAULT uuid_v8_generate(),
    data text DEFAULT 'sample data'
);

-- Insert test data
INSERT INTO benchmark (data) 
SELECT 'test-' || generate_series(1,100000);

-- Create functional index
CREATE INDEX benchmark_time_idx ON benchmark 
(uuid_stego_extract_timestamp(id));

-- Test performance
EXPLAIN ANALYZE 
SELECT * FROM benchmark 
WHERE uuid_stego_extract_timestamp(id) > 
    (EXTRACT(EPOCH FROM NOW() - INTERVAL '1 hour') * 1000000)::bigint;

What's Next? 🔮

Working on:

Version management for seamless upgrades
Monitoring functions for performance analytics
ORM integrations for popular frameworks
Cloud platform deployment guides (AWS RDS, Google Cloud SQL)

Discussion 💬

Have you hit UUID performance walls in your applications?
What's your current approach to time-based indexing?
Tried steganographic UUIDs yet? Drop your benchmark results!

The PostgreSQL ecosystem is amazing for solving real-world problems. What database challenges are you tackling?

ineron / pg_uuid_v8

pg_uuid_v8

A PostgreSQL extension for steganographic UUIDs with embedded timestamps.

Overview

pg_uuid_v8 addresses the performance vs privacy trade-off in UUID usage by implementing steganographic UUIDs. These UUIDs maintain full compatibility with the UUID v4 format while embedding hidden timestamps that enable efficient indexing and range queries.

Features

UUID v4 Compatibility: Generated UUIDs pass standard v4 validation (correct version and variant bits)
Hidden Timestamps: Microsecond-precision timestamps embedded using steganographic techniques
Configurable Encryption: XOR, AES-128, and AES-256 modes for timestamp obfuscation
Functional Indexing: Support for PostgreSQL functional indexes on extracted timestamps
Range Queries: Efficient time-based queries using hidden timestamp data
Seed Management: Configurable encryption seeds via PostgreSQL GUC variables

Technical Approach

Standard UUID implementations present a trade-off between indexing performance and timestamp privacy:

UUID v4: Random values provide good privacy but result in poor B-tree index performance due to random insertion patterns
UUID v7…

View on GitHub

Building better databases one UUID at a time 🛠️

Links:

Most AI Agents Do Not Have a Memory Problem. They Have a Coordination Problem.

Eugene — Thu, 21 May 2026 22:44:16 +0000

Most AI agents do not have a memory problem.

They have a coordination problem.

A lot of current AI agent demos focus on memory.

The agent remembers the user.

The agent remembers previous tasks.

The agent remembers documents, preferences, decisions, and context.

That is useful.

But in real enterprise workflows, the harder problem appears one step later:

How do multiple agents stay synchronized?

Because enterprise AI rarely ends with one assistant.

You may have:

a sales agent talking to customers
a proposal agent preparing commercial documents
a compliance agent checking risks and policies
a support agent handling tickets
an operations agent updating internal systems
a human approver reviewing important decisions

Each of them may have access to different tools, different context, different permissions, and different pieces of memory.

The real challenge is not whether one agent can remember something.

The challenge is whether all agents can share the same operational reality.

The Problem With Isolated Agent Memory

Many agent systems treat memory as something local to the agent.

That memory can be stored in different forms:

local files
JSON state
conversation summaries
vector databases
agent-specific notes
tool outputs
runtime context

This works well for demos.

But it becomes fragile when multiple agents are involved.

Imagine this simple workflow:

Sales Agent      -> speaks with the customer
Proposal Agent   -> prepares the offer
Compliance Agent -> checks GDPR requirements
Ops Agent        -> updates internal systems

Now imagine the sales agent learns that the customer wants an on-prem deployment.

But the proposal agent still thinks this is a SaaS deal.

The compliance agent knows that GDPR restrictions apply.

But the ops agent does not.

The result is predictable:

outdated proposals
inconsistent statuses
duplicated work
weak auditability
conflicting decisions
no clear source of truth

This is not just a memory issue.

It is a coordination issue.

Memory Is Not the Same as Operational State

Agent memory is often unstructured.

It may contain things like:

"The customer seemed worried about deployment complexity."
"They prefer on-prem."
"Budget sensitivity is high."
"Proposal should probably include compliance language."

Some of this is useful.

But enterprise systems need more than useful notes.

They need structured operational state:

{
  "customer_id": "acme",
  "deployment_preference": "on_prem",
  "compliance_required": true,
  "proposal_status": "draft",
  "approval_required": true
}

They also need to know:

Who changed this?
When did it change?
Why did it change?
Which event triggered it?
Which agent made the decision?
Was human approval required?
Can we replay or audit the process?

That is where private agent memory starts to fail.

The Missing Layer: Shared Operational State

For multi-agent systems to work in enterprise environments, agents need a shared state layer.

Not just shared memory.

Not just a vector database.

Not just chat history.

They need a layer where:

events are recorded
state is structured
changes are auditable
workflows are triggered
agents can subscribe to updates
humans can review important steps
APIs and systems can be called safely

In other words, agents need an operational backend.

A simple conceptual flow could look like this:

Customer message received
        |
        v
Event: customer.message.received
        |
        v
AI extracts requirements
        |
        v
Event: customer.requirements.updated
        |
        v
Database state changes
        |
        v
Proposal Agent, Compliance Agent, and Ops Agent receive updates
        |
        v
Workflow continues

Instead of each agent maintaining its own private notebook, all agents work through the same event-driven source of truth.

Why a Database-Native Approach Makes Sense

Most enterprise applications already depend on databases for operational truth.

CRMs, ERPs, PMS systems, ticketing systems, billing systems, and internal tools all rely on structured state.

So the question becomes:

Why should AI agents coordinate outside the database?

A database-native approach gives multi-agent systems several important properties.

1. Shared Source of Truth

All agents read and write through the same operational layer.

Sales Agent      -> shared state
Proposal Agent   -> shared state
Compliance Agent -> shared state
Ops Agent        -> shared state

No agent has to guess whether its local memory is still current.

2. Event-Driven Workflows

Every important change can become an event.

customer.created
requirement.extracted
proposal.requested
risk.detected
approval.required
document.generated
ticket.updated

These events can trigger agents, API calls, webhooks, notifications, or human approvals.

3. Auditability

Enterprise AI systems need traceability.

It should be possible to answer:

What happened?
Which agent changed the state?
What data was used?
Which workflow was triggered?
Was the decision approved?
Can we review the history?

Without auditability, multi-agent automation becomes risky.

4. Human-in-the-Loop Control

Not every agent action should be executed automatically.

Some actions should create approval tasks.

For example:

discount approval
contract generation
compliance exception
customer-facing email
system configuration change
financial action

A shared state layer makes it easier to place humans inside the workflow rather than outside of it.

5. Better Synchronization

Agents do not need to ask each other what happened.

They can react to the same event stream and read the same structured state.

That makes the system more predictable.

The Pattern

The pattern looks like this:

Agents do not own the truth.
The database owns the truth.

Agents do not just remember.
They emit events.

Events update state.
State changes trigger workflows.
Workflows activate agents, APIs, or humans.

This is the difference between:

agent memory

and:

enterprise operational state

Where LedgyX Fits

This is one of the core ideas behind LedgyX.

LedgyX turns the database into an operational control plane for AI agents.

Agents can use LedgyX to:

emit events
update structured state
trigger workflows
call APIs
send webhooks
notify users
coordinate with other agents
keep an auditable history

The goal is not to replace AI agent frameworks.

The goal is to give them a reliable operational backend.

A place where agents can stay synchronized, act through shared state, and leave a clear audit trail.

In this model, the database is not just storage.

It becomes the coordination layer.

A Simple Example

Imagine a customer onboarding workflow.

1. Sales Agent receives a customer request.
2. The request is stored as an event.
3. AI extracts requirements from the message.
4. Requirements are written into structured state.
5. Compliance Agent checks whether restrictions apply.
6. Proposal Agent generates a draft based on current state.
7. Human Approver reviews the proposal.
8. Ops Agent updates the internal system.
9. The customer receives the final response.

At every step, the system knows:

what happened
who or what triggered it
which state changed
which agent acted
which workflow continued
whether approval was required

That is much closer to how enterprise systems need to work.

The Next Question for AI Infrastructure

The first question was:

Can your agent remember?

The next question is:

Can all your agents stay synchronized?

And after that:

Can they explain what changed?
Can they act through the same source of truth?
Can humans review important decisions?
Can the system be audited?
Can workflows be replayed or debugged?

This is where multi-agent systems need to mature.

Not more isolated memory.

Not more private notebooks.

Not more disconnected vector stores.

But shared, structured, auditable operational state.

That is the infrastructure layer multi-agent AI will need in enterprise environments.

And that is the problem we are building LedgyX to solve.

Stop Guessing Which Weights Your Neural Network Actually Learned: Deterministic Initialization That Tracks Every Change

Eugene — Sun, 10 May 2026 11:18:35 +0000

The Problem Nobody Talks About

You've spent hours training your neural network. The loss converged, metrics look good, and you're ready to deploy. But here's a question you probably can't answer:

Which weights actually learned during training?

With standard initialization methods (PyTorch's kaiming_normal_, TensorFlow's he_normal), the answer is: you have no idea. Once those random values are generated, they're gone forever. You can't tell which weights changed by 0.001 and which changed by 5.0. You can't identify the "dead" neurons that never activated. And you certainly can't safely prune your model without risking quality loss.

I built a solution that fixes this — and it revealed something surprising.

The "Aha!" Moment

After implementing deterministic weight initialization with full addressability, I ran a simple experiment:

# Initialize a 6,100-parameter network
gen = DeterministicNoiseGenerator(seed=42)
for layer_id, layer in enumerate(network):
    layer.weights = gen.init_matrix(layer_id, layer.shape)

# Train normally...
train(network, epochs=50)

# Now check: which weights actually changed?
for layer_id, layer in enumerate(network):
    stats = gen.analyze_weight_matrix(layer.weights, layer_id)
    print(f"Layer {layer_id}: {stats['changed_percentage']:.1f}% active")

Results:

Layer 0 (input):  39.1% active  ← 60.9% weights did NOTHING
Layer 1 (hidden): 24.0% active  ← 76% sleeping!
Layer 2 (output): 14.0% active  ← 86% unused

Over 60% of my network's weights never meaningfully participated in learning. And I could prove it, precisely, for every single parameter.

What Is Deterministic Initialization?

Instead of generating random weights and forgetting their values, make every weight addressable by its coordinates:

w[layer_id][i][j] = f(seed, layer_id, i, j)

Where f is a pure function (no hidden state) that always returns the same value for the same inputs.

This means you can:

Generate a weight: w0 = gen.init_weight(0, 5, 10, fan_in, fan_out)
Train your model for weeks
Recover that exact weight: w0_recovered = gen.init_weight(0, 5, 10, fan_in, fan_out)
Compare: delta = current_weight - w0_recovered

Zero storage overhead. Perfect precision.

How It Works: The Technical Details

Counter-Based PRNG (SplitMix64)

Instead of sequential random number generation:

# Traditional (stateful)
rng = np.random.RandomState(42)
w = rng.randn(256, 784)  # State advances, can't recreate w[0,0] easily

Use a hash function that maps coordinates → values:

def init_weight(self, layer_id, i, j, fan_in, fan_out, mode="he"):
    # Pure function - no state
    noise = self.gaussian(layer_id, i, j)  # Deterministic N(0,1)
    std = sqrt(2.0 / fan_in) if mode == "he" else ...
    return std * noise

The gaussian() function uses SplitMix64 hash + Box-Muller transform:

def gaussian(self, *indices):
    # Hash the coordinates
    h = self.seed
    for idx in indices:
        h = self._hash64(h ^ self._hash64(idx))

    # Convert to U(0,1]
    u1 = (h >> 11) / (1 << 53)
    u2 = self._u01(*indices, 1)

    # Box-Muller → N(0,1)
    r = sqrt(-2.0 * log(u1))
    return r * cos(2 * pi * u2)

Key properties:

Deterministic: same inputs → same output
No state: can query any weight in any order
Fast: ~10 CPU cycles per value
Correct statistics: exact He/Xavier/LeCun initialization

Real-World Example: Targeted Pruning

Here's the full workflow I used to achieve 62.3% sparsity with zero accuracy loss:

from deterministic_init import DeterministicNoiseGenerator

# 1. Initialize network deterministically
gen = DeterministicNoiseGenerator(seed=42)
network = SimpleNet(input_dim=784, hidden=[256, 128, 64], output=10)

for layer_id, layer in enumerate(network.layers):
    layer.weight = gen.init_matrix(
        layer_id, 
        layer.weight.shape, 
        mode="he"
    )

# 2. Train normally (nothing special here)
train(network, train_loader, epochs=50)

# 3. Analyze which weights changed
threshold = 1e-5  # "Changed" if |w - w0| > threshold
masks = {}

for layer_id, layer in enumerate(network.layers):
    mask = gen.get_awakened_mask(
        layer.weight.numpy(), 
        layer_id, 
        threshold=threshold
    )
    masks[layer.name] = mask

    active_pct = mask.sum() / mask.size * 100
    print(f"{layer.name}: {active_pct:.1f}% active")

# 4. Prune ONLY the sleeping weights
for layer_id, layer in enumerate(network.layers):
    mask = masks[layer.name]
    layer.weight[~mask] = 0.0  # Zero out sleeping weights

# 5. Verify minimal impact
test_accuracy_before = evaluate(network_original, test_loader)
test_accuracy_after = evaluate(network_pruned, test_loader)

print(f"Before pruning: {test_accuracy_before:.4f}")
print(f"After pruning:  {test_accuracy_after:.4f}")
print(f"Difference:     {abs(test_accuracy_after - test_accuracy_before):.4f}")

My results:

input_layer:  39.1% active (60.9% pruned)
hidden1:      24.0% active (76.0% pruned)
hidden2:      14.0% active (86.0% pruned)

Before pruning: 0.9423
After pruning:  0.9419
Difference:     0.0004  ← Negligible!

This isn't magnitude-based pruning (which can destroy important small weights) or lottery ticket hypothesis (which requires storing a full copy of initial weights). This is precision pruning — removing only weights we know didn't participate.

Interactive Testing Tool

I also built a CLI tool to explore weight initialization visually:

# Generate a matrix with specific seed
python test_matrix_generator.py --seed 42 --rows 10 --cols 20

# Compare He vs Xavier vs LeCun
python test_matrix_generator.py --seed 42 --rows 8 --cols 8 --compare-modes

# Test reproducibility (generates same matrix 3 times)
python test_matrix_generator.py --seed 42 --rows 5 --cols 5 --test-repro

Output example:

GENERATED MATRIX (seed=42, layer_id=0, mode=he)
================================================

         0          1          2          3     ...
  0   0.960776   0.273809   0.253874   0.063188 ...
  1  -0.280019  -0.300499  -0.373002  -0.000792 ...
  2  -0.626875   0.343619  -0.583797   0.326972 ...

Statistics:
  Shape:          10 x 20
  Mean:           0.00123456 (near 0 ✓)
  Std:            0.31622777 (target: 0.31622777 ✓)
  Min:           -1.23456789
  Max:            1.56789012

✓ Reproducibility test passed (3/3 trials identical)

Bonus: Orthogonal Initialization

For RNNs and very deep networks, you can also generate orthogonal matrices:

# Normal initialization: condition number ~495
W_normal = gen.init_matrix(0, (128, 128), mode="he")
print(f"Condition: {np.linalg.cond(W_normal):.0f}")
# → 495

# Orthogonal initialization: condition number ~1
W_ortho = gen.init_matrix(1, (128, 128), mode="he", orthogonal=True)
print(f"Condition: {np.linalg.cond(W_ortho):.0f}")
# → 1

# Improvement: 495x better conditioning!

This uses QR decomposition on the deterministic Gaussian matrix, giving you the best of both worlds: proper variance scaling and excellent conditioning.

Transformer-Specific Initialization

The tool also handles special cases like Transformer attention:

d_model = 512
std_qkv = 1.0 / sqrt(d_model)  # Critical for attention stability

Q = gen.init_matrix(0, (d_model, d_model), mode="custom")
K = gen.init_matrix(1, (d_model, d_model), mode="custom")
V = gen.init_matrix(2, (d_model, d_model), mode="custom")

# All weights scaled to std = 1/√d_model
# Ensures attention scores stay in [-0.1, 0.1] range

Benchmarks

All numbers from a simple feedforward network (6,100 params):

Metric	Result
Reproducibility	100% (max diff: 0.0e+00)
Overhead per weight	O(1), ~10 CPU cycles
Memory overhead	0 bytes (pure function)
Generation time	<1ms for 1M weights
Pruning sparsity	60-70% typical
Accuracy loss	<0.001 typical

Comparison to Alternatives

Feature	This Method	PyTorch Init	Lottery Ticket	Magnitude Pruning
Deterministic	✅	❌	❌	N/A
Addressable	✅	❌	❌	N/A
Track changes	✅	❌	⚠️ (2x memory)	❌
Zero overhead	✅	✅	❌	✅
Precision pruning	✅	❌	⚠️ (approximate)	⚠️ (heuristic)

Try It Yourself

Full code on GitHub (MIT license):

git clone https://github.com/yourusername/deterministic-init
cd deterministic-init

# Install (NumPy only)
pip install numpy

# Run interactive tool
python test_matrix_generator.py

# Or see the full showcase
python showcase.py

Quick start:

from deterministic_init import DeterministicNoiseGenerator

gen = DeterministicNoiseGenerator(seed=42)

# Initialize weights
weights = gen.init_matrix(layer_id=0, shape=(256, 784), mode="he")

# After training, check what changed
stats = gen.analyze_weight_matrix(trained_weights, layer_id=0)
print(f"Active: {stats['changed_percentage']:.1f}%")
print(f"Sleeping: {100 - stats['changed_percentage']:.1f}%")

# Get mask of active weights
mask = gen.get_awakened_mask(trained_weights, layer_id=0)

# Safe pruning
trained_weights[~mask] = 0.0

What This Enables

Beyond pruning, this opens doors to:

Lottery Ticket Hypothesis experiments: Track which subnetworks learned
Neural Architecture Search: Identify important connections
Gradient flow analysis: Detect vanishing/exploding gradients early
Curriculum learning: Visualize learning progression by layer
Debugging: "Why isn't this layer learning?" → Now you can check!

The Math (For the Curious)

SplitMix64 Hash:

h ← (h + GOLDEN_RATIO) mod 2^64
h ← (h ⊕ (h >> 30)) × MIX1 mod 2^64
h ← (h ⊕ (h >> 27)) × MIX2 mod 2^64
h ← h ⊕ (h >> 31)

Box-Muller Transform:

U₁, U₂ ~ Uniform(0,1)
R = √(-2 ln U₁)
θ = 2π U₂
Z = R cos(θ)  →  Z ~ N(0,1)

He Initialization:

Var(y) = Var(Wx)
       = Var(W) · Var(x) · fan_in

To preserve variance through ReLU:
Var(W) = 2/fan_in
std(W) = √(2/fan_in)

Limitations & Future Work

Current limitations:

Not a drop-in replacement for framework initializers (requires manual integration)
Orthogonal init is O(n³) for QR decomposition (fast for reasonable sizes)
Pruning threshold selection is somewhat manual

Potential improvements:

Auto-tuned threshold based on gradient magnitude
Integration with PyTorch/TensorFlow as custom initializer
Distributed generation for massive models
Sparse storage format optimization

Conclusion

Every neural network has "dead weight" — parameters that never meaningfully contribute to the output. Traditional initialization makes this invisible. Deterministic, addressable initialization makes it measurable.

In my experiments, 60-70% of weights were sleeping. Your network might be carrying similar dead weight. Now you can find out exactly which ones.

The code is open source, MIT licensed, and production-ready. Give it a try and let me know what percentage of your network is actually working!

Links:

What percentage of your network is sleeping? 🤔

Drop a comment with your results if you try this out!

Tags: #machinelearning #python #neuralnetworks #deeplearning #pytorch #tensorflow #ai #pruning #optimization

Your PostgreSQL Already Has a Graph Engine — You Just Have to Build It

Eugene — Wed, 06 May 2026 21:46:19 +0000

We Replaced Recursive CTEs with a C Traversal Framework and Got ×207 Faster

TL;DR: We built pg_igraph — a graph engine inside PostgreSQL as a C extension. The first working version used recursive CTEs. It took 47 seconds to traverse a 335K-node tree. The final version uses an in-memory adjacency list with pure-C BFS. The same query takes 227ms. Here's why CTEs are the wrong tool for this, and what the right tool looks like.

The Setup

We had graph-shaped data — users, relationships, hierarchies — and it lived in PostgreSQL. The standard answer is "use Neo4j," but that means a second database to deploy, back up, and keep in sync. For a graph that fits on one server, that felt like unnecessary complexity.

So we built pg_igraph: a PostgreSQL C extension with BFS traversal, bidirectional shortest path, and a small query language. The SQL API looks like this:

SELECT * FROM graph_traverse(42, 'FOLLOWS', true, 5);
SELECT graph_shortest_path(42, 999, 'FOLLOWS');
SELECT igraph_query('MATCH (n:User)-[:FOLLOWS*1..3]->(m) RETURN m');

The data model is two partitioned tables:

nodes (id BIGSERIAL, label SMALLINT)
edges (from_id BIGINT, to_id BIGINT, rel_type SMALLINT, direction BOOL)
      PARTITION BY HASH(from_id)

Both forward and reverse edges are stored explicitly — direction = true for outgoing, direction = false for incoming. Traversal in both directions uses identical query plans.

Getting the schema right was the easy part. Getting the traversal fast took several attempts.

Attempt 1: Recursive CTE

The obvious first approach. PostgreSQL has built-in support for recursive queries:

WITH RECURSIVE bfs(node_id, d) AS (
    -- Base case: start node
    SELECT $1::bigint, 0
    UNION ALL
    -- Recursive step: expand one level
    SELECT e.to_id, b.d + 1
    FROM bfs b
    JOIN edges e ON e.from_id = b.node_id
        AND e.rel_type = $2::smallint
        AND e.direction = $3::bool
    WHERE b.d < $4::int
)
SELECT DISTINCT node_id FROM bfs;

This is clean and readable. It also took 47 seconds on a 335K-node tree.

Why? PostgreSQL's recursive executor works like this:

Evaluate the base case, materialize results into a working table
Join the working table with the recursive term, produce new rows
Materialize those rows, repeat
At the end: apply DISTINCT to collapse duplicates

The key problem is step 4. UNION ALL (required for depth tracking) produces duplicates — the same node can appear at multiple depths. PostgreSQL has no way to maintain a "visited" set across iterations, so every node at every depth flows through the pipeline and gets collapsed at the end. For a 335K-node tree with branching factor 6, the intermediate materialization at depth 7 alone contains 6^7 = 279,936 rows — many of them duplicates.

There is no optimization path out of this. The recursive CTE model fundamentally cannot do what a traversal framework does: maintain state across levels, skip already-visited nodes, and stop early.

Attempt 2: Per-Node SPI (First C Implementation)

We moved the BFS logic into a C extension using PostgreSQL's Server Programming Interface (SPI). The idea: maintain the BFS queue in C, issue one SQL query per node to fetch neighbors, track visited nodes in a C hash table.

// Pseudocode of the first C implementation
while (queue not empty) {
    cur_id = dequeue();

    SPI_execute_plan(plan_get_neighbors, [cur_id, rel_id, direction]);
    // process results, enqueue unvisited neighbors
}

This version correctly handles visited tracking and never revisits a node. The 47-second CTE query became... 19,531 SPI calls for a 19,531-node tree — one per node.

Each SPI round-trip costs ~0.04ms on our hardware (context switching, prepared plan execution, result materialization). That's 19,531 × 0.04ms ≈ 800ms for the small-scale tree. Better than 47 seconds, but still O(N) in SPI overhead.

For a 335K-node tree: 335,923 SPI calls → 47 seconds. Same number as the CTE, different reason.

Attempt 3: Batch Per Level with ANY($1)

Instead of one SPI call per node, fetch neighbors for an entire BFS frontier level in one query:

SELECT from_id, to_id FROM edges
WHERE from_id = ANY($1::bigint[]) AND rel_type = $2 AND direction = $3

Pass the entire frontier as a bigint array. One SPI call per depth level instead of one per node. For a 335K-node tree with 7 depth levels: 7 SPI calls.

Results:

Tree 335K: 47s → 2.7s ✓
Chain 1K: 42ms → 155ms ✗ (×3.7 regression)

The regression on chains revealed a fundamental issue with ANY($1) on HASH-partitioned tables.

The partition pruning trap. The edges table is partitioned by HASH(from_id). For a point lookup WHERE from_id = $1, PostgreSQL can compute HASH($1) at planning time and target exactly one partition. For WHERE from_id = ANY($1::bigint[]), it cannot — the array contents are unknown at plan time, so it generates a plan that scans all 16 partitions and filters.

On a chain traversal with frontier size 1, this means: 1,000 depth levels × 16 partition scans × (full partition read + filter) = significant wasted I/O.

Attempt 4: LATERAL unnest — Restoring Partition Pruning

The fix: use LATERAL with unnest() instead of ANY.

SELECT f.id, e.to_id
FROM unnest($1::bigint[]) AS f(id)
JOIN LATERAL (
    SELECT to_id FROM edges
    WHERE from_id = f.id AND rel_type = $2 AND direction = $3
) AS e ON true

LATERAL forces a Nested Loop plan. For each element from unnest(), PostgreSQL executes the inner query independently — with full HASH partition pruning on from_id = f.id.

Explain output:
Nested Loop
  -> Function Scan on unnest
  -> Index Only Scan on edges_pN
       Index Cond: (from_id = f.id) AND (rel_type = ...)

Results:

Tree 335K: 47s → 46ms ✓ (×1020 improvement)
Chain 1K: 42ms → 435ms ✗ (×10 regression)

The LATERAL unnest has overhead per call — unnest() has setup cost that dwarfs a simple index seek when the frontier is 1 element. 1,000 depth levels × (unnest overhead) = visible regression on chains.

The insight: neither ANY nor LATERAL is universally better. The right tool depends on frontier size.

Attempt 5: Hybrid Dispatch

if (frontier_size == 1) {
    // Simple prepared statement: single index seek, no array overhead
    SPI_execute_plan(plan_get_neighbors, [cur_id, rel_id, direction]);
} else {
    // LATERAL unnest: one round-trip for the whole frontier
    SPI_execute_plan(plan_get_neighbors_batch, [array, rel_id, direction]);
}

This recovered chain performance while keeping the tree improvement. But the fundamental issue remained: for any graph where frontier eventually explodes (trees, random dense graphs), you're still paying SPI overhead per level, plus the cost of building and passing progressively larger arrays.

The Right Approach: Load Once, Traverse in C

The insight from all the failed attempts: as long as BFS is driven by SQL, you're fighting the impedance mismatch between a set-based query engine and an iterative graph algorithm.

pg_routing, the PostgreSQL routing extension, solved this the right way years ago: load the graph into memory, route in C. We needed to do the same.

One SPI call loads all edges for the requested rel_type into a C-level adjacency hash map:

// Build adjacency list: from_id → int64[] neighbors
static AdjList *
build_adj_list(int16 rel_id, bool direction, MemoryContext ctx)
{
    Datum args[] = { Int16GetDatum(rel_id), BoolGetDatum(direction) };
    SPI_execute_plan(plan_load_all_edges, args, NULL, true, 0);
    // Copy (from_id, to_id) pairs out of SPI tuptable
    // Build HTAB: from_id → {neighbors[], n, cap}
    // Return AdjList in ctx
}

Then BFS runs entirely in C over the hash map:

// After build_adj_list, SPI is closed. Zero SQL during traversal.
while (queue not empty) {
    cur_id = dequeue();

    AdjNode *node = hash_search(adj->htab, &cur_id, HASH_FIND, &found);
    if (!found) continue;

    for (int i = 0; i < node->n; i++) {
        int64 nbr = node->neighbors[i];
        hash_search(visited, &nbr, HASH_ENTER, &found);
        if (found) continue;
        enqueue(nbr);
        result[res_size++] = nbr;
    }
}

For the 335K-node tree: 2 SPI calls total (rel_id lookup + edge load), then 335K hash table lookups in C. No SQL during traversal.

Results vs CTE baseline:

	Recursive CTE	Load-all + C BFS
Tree 335K full	47,000ms	227ms
Chain 10K full	~400ms	72ms

But Load-All Has a Fixed Cost

Loading 335K edges takes ~50-80ms on HDD even with a good index. For a query that only needs 8 results (find ancestors of a leaf node, depth=6), this is wasteful.

The signal isn't query depth — it's frontier size. A chain of depth 10,000 has frontier=1 at every level. A 6-branch tree hits frontier=7,776 by level 5. When frontier is small, per-level SPI is cheaper. When frontier explodes, load-all pays off immediately.

The final implementation starts per-level and switches at runtime:

#define ADAPTIVE_FRONTIER_THRESHOLD 200

while (depth < max_depth && frontier_size > 0) {
    expand_frontier_with_spi();  // per-level batch or single
    depth++;

    if (frontier_size > ADAPTIVE_FRONTIER_THRESHOLD) {
        // Frontier is growing fast — load-all will be net cheaper
        adj = build_adj_list(rel_id, direction, work_ctx);
        SPI_finish();  // no more SQL
        switched = true;
        break;
    }
}

if (switched)
    bfs_pure_c(adj, visited, frontier, depth, max_depth);

The visited HTAB is shared between phases. Nodes discovered in Phase 1 are already marked when Phase 2 starts.

Final benchmark — medium scale, HDD server:

Test	Recursive CTE	Final (v10)	Factor
BFS tree 335K, full traversal	47,000 ms	227 ms	×207
Shortest path, 10K chain	618 ms	49 ms	×12
BFS chain, depth=100	14 ms	16 ms	≈ same
BFS tree, depth=3	12 ms	3.6 ms	×3 ↓
Reverse BFS, 8 ancestors	6 ms	6 ms	≈ same
Query language MATCH depth=10	3 ms	3 ms	≈ same

One More Thing: The Covering Index

build_adj_list runs:

SELECT from_id, to_id FROM edges WHERE rel_type = $1 AND direction = $2

The table is partitioned by HASH(from_id). rel_type and direction have no relationship to the partition key — without a dedicated index, this scans all 16 partitions and all rel_types, then filters. With 1.1M total edges across three rel_types, that reads everything to return 200K rows.

The fix: one covering index per partition:

CREATE INDEX ON edges_pN (rel_type, direction) INCLUDE (from_id, to_id);

Index-only scan, reads only the requested rel_type. graph_shortest_path on a 10K chain dropped from 618ms to 49ms from this single index.

Conclusion

The recursive CTE approach is appealing — it's SQL, it's readable, it feels like it should work. But PostgreSQL's recursive executor is fundamentally an iterative set processor, not a traversal framework. It cannot maintain visited state across iterations, cannot skip already-explored nodes, and collapses duplicates only at the end. For graphs of any meaningful size, this becomes unacceptable.

Moving BFS into C and loading the subgraph once — the same approach pg_routing has used for years — resolves the impedance mismatch. The traversal is just pointer arithmetic in a hash table.

The adaptive frontier-based switching gives good performance across all workload shapes: narrow traversals (chains, ancestor lookups) pay no preload cost; wide traversals (trees, dense random graphs) preload once and traverse in C.

pg_igraph is open source under Apache 2.0:
👉 https://github.com/ineron/pg_igraph

The companion pg_ilib binary property serialization extension is in a separate repository.

pg_ilib: Compact Typed Binary Serialization for PostgreSQL

Eugene — Sun, 26 Apr 2026 16:14:29 +0000

The problem

PostgreSQL's bytea type is powerful for storing raw binary data, but it carries no type information. Once you store a value as bytes, you need out-of-band metadata to know whether those bytes represent a number, a UUID, a timestamp, or a JSON object.

This gets painful when you're building dynamic schemas — EAV tables, schemaless document stores, or audit logs — where a single column holds values of different types. You end up carrying a separate type column everywhere, writing CASE expressions to decode it, and hoping they stay in sync.

pg_ilib solves this with a simple idea: prefix every serialized value with a 2-byte typed header.

The format

Byte 0: [ op_id (4 bits) | params_hi (4 bits) ]
Byte 1: [ params_lo (8 bits) ]
Bytes 2…N: payload

The op_id identifies the type. The params field carries type-specific metadata: decimal scale for numerics, timezone offset in minutes for timestamps.

op_id	Type	params
`0x01`	text	—
`0x02`	numeric / bigint	decimal scale
`0x03`	bool	—
`0x04`	timestamp / date	tz offset (signed minutes)
`0x08`	uuid	—
`0x0E`	jsonb	—
`0x0F`	hex bytes	—

Installation

# requires libgmp-dev (Debian/Ubuntu) or gmp-devel (RHEL/CentOS)
make && sudo make install

CREATE EXTENSION pg_ilib;

Works with PostgreSQL 11+ on any platform where pg_config is on PATH.

Basic usage

Each type has a symmetric pair of functions:

-- bigint
SELECT bytea_to_bigint(bigint_to_bytea(123456789));
-- 123456789

-- text
SELECT bytea_to_str(str_to_bytea('hello world'));
-- hello world

-- numeric with scale
SELECT bytea_to_numeric(numeric_to_bytea(3.14159, 5));
-- 3.14159

-- uuid
SELECT bytea_to_uuid(uuid_to_bytea('ac861c64-52ae-b223-4d6a-5c26fc34994c'));
-- ac861c64-52ae-b223-4d6a-5c26fc34994c

-- jsonb
SELECT bytea_to_jsonb(jsonb_to_bytea('{"name":"Alice","age":30}'::jsonb));
-- {"age": 30, "name": "Alice"}

Implicit CASTs are registered for most types:

SELECT 42::bigint::bytea;
SELECT '\x200000000000002a'::bytea::bigint;

The killer feature: value_to_jsonb

Because the type is embedded in the header, a single function can decode anything:

SELECT value_to_jsonb(bigint_to_bytea(42));        -- 42
SELECT value_to_jsonb(str_to_bytea('hello'));       -- "hello"
SELECT value_to_jsonb(bool_to_bytea(true));         -- true
SELECT value_to_jsonb(numeric_to_bytea(3.14, 2));   -- 3.14
SELECT value_to_jsonb(uuid_to_bytea('ac861c64-52ae-b223-4d6a-5c26fc34994c'));
-- "ac861c64-52ae-b223-4d6a-5c26fc34994c"

This makes EAV-style tables genuinely usable:

CREATE TABLE entity_attributes (
    entity_id uuid,
    key       text,
    value     bytea   -- holds any type, self-describing
);

-- Store mixed types in one column
INSERT INTO entity_attributes VALUES
    ('ac861c64-...', 'age',      bigint_to_bytea(30)),
    ('ac861c64-...', 'score',    numeric_to_bytea(9.75, 2)),
    ('ac861c64-...', 'active',   bool_to_bytea(true)),
    ('ac861c64-...', 'joined',   timestamp_to_bytea(1700000000)),
    ('ac861c64-...', 'tag',      str_to_bytea('premium'));

-- Decode everything to JSON in one query, no type column needed
SELECT key, value_to_jsonb(value)
FROM entity_attributes
WHERE entity_id = 'ac861c64-...';

    key    | value_to_jsonb
-----------+----------------
 age       | 30
 score     | 9.75
 active    | true
 joined    | "2023-11-14T22:13:20Z"
 tag       | "premium"

Timestamps and timezones

The params field stores the timezone offset in signed minutes, so the original offset survives the round-trip:

-- Store UTC
SELECT timestamp_to_bytea(1766323245);

-- Decode as plain timestamp (offset ignored)
SELECT bytea_to_timestamp(timestamp_to_bytea(1766323245));
-- 2026-01-21 00:00:45

-- Decode as timestamptz with UTC+2 offset baked in
SELECT bytea_to_timestamptz(timestamp_to_bytea(1766323245, 120));
-- 2026-01-21 02:00:45+02

Built-in corruption detection

Every decoder calls pg_ilib_check_header() before touching payload bytes. Impossible (op_id, params, payload_size) combinations raise ERRCODE_DATA_CORRUPTED instead of crashing the server:

-- Scale 4095 is impossible for a 1-byte payload (max = 3)
SELECT bytea_to_numeric('\x2FFF2a');
-- ERROR: pg_ilib bytea_to_numeric: numeric scale 4095 is impossible
--        for 1 payload byte(s) (max scale = 3)

-- Timezone offset out of IANA range [-840, 840] minutes
SELECT bytea_to_timestamp('\x4FFF12345678');
-- ERROR: pg_ilib bytea_to_timestamp[tz]: timezone offset 4095 min
--        is out of valid range [-840, 840]

-- Unknown op_id
SELECT value_to_jsonb('\x9000ff');
-- ERROR: pg_ilib value_to_jsonb: unknown op_id 0x09

Repository structure

The repo contains three independent extensions that share a single directory and build system:

Extension	Description	Build
`pg_ilib`	Typed bytea serialization (this article)	`make`

Build all three at once:

make all-ext && sudo make install-ext

Testing without installing

make testdb        # create pg_ilib_test database (once)
make quicktest     # compile and run test/quick_test.sql against a /tmp copy

# Override host/user if needed
make quicktest PG_HOST=10.0.0.1 PG_USER=myuser

Why we built this

pg_ilib started as an internal component of LedgyX — a low-code API platform that generates FastAPI applications directly from PostgreSQL schemas.

The core challenge in LedgyX is that every table, column, and type is defined dynamically at runtime. We needed a way to store values of any SQL type in a single bytea column and decode them correctly later — without carrying a separate type column everywhere. pg_ilib is the solution we built and have been running in production.

We decided to open-source it because the problem is general enough to be useful beyond LedgyX. If you're building EAV tables, audit logs, dynamic schemas, or any system where a column holds mixed types — this is for you.

Feedback and PRs welcome. If you're using this in production or have ideas for new op_id types, open an issue!

JSON vs JSONB in PostgreSQL: I tested 1M rows to find out

Eugene — Sun, 12 Apr 2026 07:52:39 +0000

Recently I tried to resolve a recurring question in our team:

Is JSON or JSONB actually faster in PostgreSQL?

I couldn’t find a clear answer that matched real-world usage, so I ran my own benchmark.

Setup

I loaded 1 million records with identical data into both JSON and JSONB columns and tested common operations.

Hardware:
Dell PowerEdge R450
2x Intel Xeon Silver 4310 (24/48 cores @ 2.1GHz)

I intentionally used mid-range hardware so the differences would be easier to see.

What I tested

INSERT performance
Key-based search (data->>'field' = 'value')
Nested updates
Complex multi-condition queries
Array access (data->'items'[0])
Key existence (data ? 'key')
Path queries (data #> '{user,profile,name}')
Aggregations
Storage size

Results

Insert speed

JSON was faster:

JSON: 8.6s
JSONB: 11.3s

~31% difference, which makes sense - JSON is stored as plain text.

Query performance

This is where things got interesting.

JSONB was significantly faster across all read operations:

Simple key extraction → 6.2x faster
Nested field access → 7.6x faster
Array operations → 7.3x faster
Complex conditions → 9.1x faster

On average: ~7x faster

GIN indexes + binary format make a huge difference here.

Updates

For partial updates:

JSONB was ~71% faster

Storage

JSON: ~1200 MB
JSONB: ~888 MB

JSONB used ~26% less space, mainly due to key deduplication.

Key takeaway

If your workload is read-heavy (which is most backend systems):

JSONB pays for itself very quickly.

Even with slower inserts, the performance gain in queries dominates after relatively few operations.

One important detail

Operators like:

@>
?
?&

only work with JSONB.

Without them, efficient querying becomes very limited.

Repo

I published the full benchmark here:

https://github.com/ineron/postgresql-json-jsonb-benchmark

Includes:

SQL scripts
Python automation
Docker setup

Final thought

I expected JSONB to be faster, but not by this margin.

Curious if anyone has seen similar results in production?