PSBigBig

Posted on Aug 5

I tried fixing my RAG system. ended up building a graveyard.

#webdev #ai #programming #devops

16 reasons your retrieval-augmented generation pipeline fails even when everything looks fine.

i tried fixing my RAG system. ended up building a graveyard.

16 reasons your retrieval-augmented generation pipeline fails even when everything looks fine.

i didn’t set out to build a framework.
i just wanted my retrieval system to stop lying.

the docs were clean.
the vector search was sharp.
the top-k chunks came back as expected.

and yet.

the answers were wrong.
not wildly wrong — just wrong enough to fail.

no errors. no crashes. just... silence.

you know the kind.

Q: what is the capital of France?  
Doc: Paris is the capital of France.  
A: France has several prominent cities including Lyon and Marseille.

looks plausible. fails production.
the worst kind of error — semantic drift with confidence.

i opened a notebook.

then a repo.
then a map.

one by one, i listed all the weird bugs that didn't show up as bugs.

chunk retrieved, but logic broken
LLM hallucinated across chunks
query mismatch despite exact match
answer contradicts doc, but only in passive voice

this wasn’t about reranking.
this wasn’t about prompt tuning.

this was about structural failure modes.

i called them “RAG collapse types”.

and the list grew.
16 total.

some had names. some didn’t.

but all of them lived in the repo now:
→ WFGY/ProblemMap/README.md

and each one comes with an actual patch.
not a vibe. not a "maybe try RAGAS".
a patch. in code. MIT licensed.

real issues. real users. real saves.

eventually, i started replying to people on GitHub, Reddit, LangChain Discussions.
quietly.

they post a bug.
i reply with the exact failure number.
they stare.
then they DM.

one by one:
i log the saves here:
→ Hero Log

this is not theory.
this is debug-level archaeology of semantic failure.

what is WFGY?

it's a repo.
but also a worldview.

a way to treat retrieval logic as first-class reasoning, not pre-processing.

it’s built on:

16-part problem map
a set of internal modules (some math-heavy)
stability protocols (like ΔS = 0.5, symbolic filters, collapse detectors)

also:

backed by real stars (👀 Tesseract.js creator)
zero funding
one human team
300+ stars in 50 days, no promo

closing note

you don’t have to believe me.
you just have to wait.

because if you’re building RAG systems, you’re either:

already hitting these 16
will hit them next month
pretending your users won’t notice

and if you ever get tired of pretending:

→ the repo

see you in the Hero Log.

DEV Community