DEV Community

PSBigBig
PSBigBig

Posted on

I tried fixing my RAG system. ended up building a graveyard.

16 reasons your retrieval-augmented generation pipeline fails even when everything looks fine.

i tried fixing my RAG system. ended up building a graveyard.

16 reasons your retrieval-augmented generation pipeline fails even when everything looks fine.


i didn’t set out to build a framework.
i just wanted my retrieval system to stop lying.

  • the docs were clean.
  • the vector search was sharp.
  • the top-k chunks came back as expected.

and yet.

the answers were wrong.
not wildly wrong — just wrong enough to fail.


no errors. no crashes. just... silence.

you know the kind.

Q: what is the capital of France?  
Doc: Paris is the capital of France.  
A: France has several prominent cities including Lyon and Marseille.
Enter fullscreen mode Exit fullscreen mode

looks plausible. fails production.
the worst kind of error — semantic drift with confidence.


i opened a notebook.

then a repo.
then a map.

one by one, i listed all the weird bugs that didn't show up as bugs.

  • chunk retrieved, but logic broken
  • LLM hallucinated across chunks
  • query mismatch despite exact match
  • answer contradicts doc, but only in passive voice

this wasn’t about reranking.
this wasn’t about prompt tuning.

this was about structural failure modes.


i called them “RAG collapse types”.

and the list grew.
16 total.

some had names. some didn’t.

but all of them lived in the repo now:
WFGY/ProblemMap/README.md

and each one comes with an actual patch.
not a vibe. not a "maybe try RAGAS".
a patch. in code. MIT licensed.


real issues. real users. real saves.

eventually, i started replying to people on GitHub, Reddit, LangChain Discussions.
quietly.

they post a bug.
i reply with the exact failure number.
they stare.
then they DM.

one by one:
i log the saves here:
Hero Log

this is not theory.
this is debug-level archaeology of semantic failure.


what is WFGY?

it's a repo.
but also a worldview.

a way to treat retrieval logic as first-class reasoning, not pre-processing.

it’s built on:

  • 16-part problem map
  • a set of internal modules (some math-heavy)
  • stability protocols (like ΔS = 0.5, symbolic filters, collapse detectors)

also:

  • backed by real stars (👀 Tesseract.js creator)
  • zero funding
  • one human team
  • 300+ stars in 50 days, no promo

closing note

you don’t have to believe me.
you just have to wait.

because if you’re building RAG systems, you’re either:

  1. already hitting these 16
  2. will hit them next month
  3. pretending your users won’t notice

and if you ever get tired of pretending:

→ the repo

see you in the Hero Log.

Top comments (0)