DEV Community: BN

I built a token-level debugger for comparing two LLMs

BN — Tue, 26 May 2026 00:14:28 +0000

Same prompt, two models, different outputs. No tooling was actually showing me where they diverged.
Built tokenflame that gives entropy heatmaps, tokenizer diffs, divergence markers, token-by-token replay. One command, one HTML file.
pip install tokenflame

I built a vector embedding cache that makes stale hits structurally impossible

BN — Sat, 16 May 2026 21:49:52 +0000

Wrote up the design behind embcache, a GPU-native two-tier cache for embeddings and KV states.

The problem it solves: embedding caches that key on content hash alone silently return stale vectors after a model upgrade or tokenizer change. The cache looks healthy. The vectors are wrong.

The fix is a composite EmbeddingFingerprint covering model_id, tokenizer hash, chunking strategy, normalization version, prompt template, and dataset version. No partial matches, so no path to a stale hit from a pipeline change.

Full writeup with benchmarks (98.3% hit rate, 400-450x speedup on KV cache hits) on Medium: https://bh3r1th.medium.com/the-vector-embedding-cache-bug-that-costs-nothing-and-corrupts-everything-157be6c575e8

Repo: https://github.com/bh3r1th/embcache

Not on PyPI yet. Looking for feedback, especially on whether the fingerprint schema covers all the axes that could cause a stale hit in your pipeline.

Most RAG failures don’t crash. They silently return bad answers. I built a repair layer for that.

BN — Sun, 10 May 2026 01:51:54 +0000

Most RAG tooling provides a score but fails to specify what actually went wrong.

I had retrieval failures, grounding issues, generation going sideways, all showing up as a number. No way to know which failure caused which run to go wrong. No way to fix it without guessing.

So I built ragbolt.

ragbolt is a failure-aware repair layer for RAG pipelines that:

Detects whether the failure originated from retrieval, generation, or grounding
Applies one bounded repair at a time
Re-verifies the result
Emits a full trace to show exactly what changed and why

It’s not a framework.
Not an agent.
Not "self-healing RAG".

Just a small wrapper around existing RAG pipelines with explicit repair limits, auditability, and a hard stop when confidence breaks down.

It runs standalone and integrates with LangChain + LlamaIndex.

pip install ragbolt

Deterministic reliability stack for LLM pipelines

BN — Sat, 09 May 2026 18:28:13 +0000

I have been spending the last few months wiring up a deterministic reliability stack for structured LLM pipelines.

Today, LLM Contract Check (locc) and Release Governor went live on PyPI. EGA went live last week.

The stack is straightforward:
LLM Contract Check - CI contract testing to catch schema regressions.
Release Governor - Blocks staging promotion if malformed outputs leak.
EGA - Runtime enforcement. Forces outputs to ground against source evidence before they move downstream.

The idea is simple:
don’t wait until production logs or human evals tell you something broke.

Try to catch:

unstable contracts in CI
leakage before deploy
unsupported outputs at runtime

Still early.
Not benchmarked.
Definitely not claiming this "solves AI safety."

I'm mainly looking for engineers building RAG or structured-output systems who are willing to plug pieces of this in and tell me where the assumptions break.

pip install llm-locc
pip install llm-release-governor
pip install ega

EGA: Runtime Enforcement for LLM Outputs (v1.0.0)

BN — Fri, 01 May 2026 01:36:39 +0000

I built EGA, a runtime enforcement layer for LLM outputs.

The problem: eval tools usually score after something already went wrong.

They do not stop bad outputs from going downstream.

EGA sits in the runtime path and checks the model output against the source before letting it pass through.

If something does not have support, it gets dropped or flagged.

v1.0.0 is live on PyPI today.

This is still early:

not benchmarked yet
not production-grade calibration yet
needs real RAG pipeline feedback

I am looking for engineers building RAG pipelines who are willing to plug this in and tell me where it breaks.

pip install ega
GitHub: https://github.com/bh3r1th/llm-evidence-gated-generation
PyPI: https://pypi.org/project/ega/1.0.0/