Hi all,
I want to share a small story behind my WFGY framework, from 1.0 to 3.0. Some people maybe saw WFGY before on GitHub / Dev, but this is a more clear version for this community.
The idea is simple:
- WFGY 1.0 is very beginner friendly, just a PDF you can read and test with any LLM.
- WFGY 2.0 is more for RAG / vector DB / agent debugging.
- WFGY 3.0 is a TXT “singularity demo” with 131 S-class tests, more crazy, but still just text.
WFGY 1.0 – good entry point for LLM beginners
WFGY 1.0 started as around 30 pages PDF called “All Principles Return to One”.
It treats an LLM like a system that can “self-heal” using only text:
four modules (BBMC, BBPF, BBCR, BBAM) run as a loop on top of the model, no weight change, no fine-tune, only prompt-level structure.
We tested 10 benchmarks (MMLU, GSM8K, BBH, MathBench, TruthfulQA, …). Very rough numbers:
- MMLU: baseline around 68.2% → with WFGY 1.0 around 91.4%
- GSM8K: baseline around 45.3% → with WFGY 1.0 around 84.0%
- mean time-to-failure in long runs: roughly ×3.6
For beginners, 1.0 is probably the easiest place to start:
- you just download the PDF,
- open a Kaggle Notebook with any LLM API or local model,
- copy some of the loop structure and prompts,
- and see how the behavior change by yourself.
No special library, no heavy code.
It is more like “prompt engineering with a serious framework”, and you can play slowly.
WFGY 2.0 – Core + 16-problem checklist for RAG / agents
WFGY 2.0 moved from theory PDF into something that can sit inside real projects.
Two key parts:
- The core is compressed into one tension metric
delta_s = 1 − cos(I, G)with four zones: safe / transit / risk / danger. (I = intention, G = generated behavior.) - On this, I built a ProblemMap with a 16-problem list for common AI engineering pain: RAG retrieval failure, vector store fragmentation, prompt injection, wrong deployment order, etc.
Many engineers use this 16-problem list like a debugging checklist:
- when your RAG or agent looks weird,
- you match it to one of the 16 problems,
- then apply the suggested fix / guardrail.
If you build chatbots, assistants or pipelines on Kaggle (or anywhere), WFGY 2.0 is the part that maps most directly to your daily pain.
WFGY 3.0 – Singularity Demo as a TXT pack (more advanced)
Now the new part.
WFGY 3.0 · Singularity Demo is now online in the same main GitHub repo. This time it is not a PDF, but a TXT pack designed for LLMs to read directly.
Very conservative description:
- it packages a “Tension Universe / BlackHole” layer as 131 S-class problems
- it is still only text: no code, no external calls
- it is meant as a public stress test to see how far this framework can go, across many domains
For Kaggle users, you can treat 3.0 like a “text-only test lab”:
- download the TXT,
- in a Notebook, send the file content to your LLM (any endpoint you like),
- then follow the small protocol:
- type
run - it will show a menu
- choose
go
- type
- let the LLM run the short demo and just watch how it behaves
How I suggest to start (beginner / intermediate / advanced)
Very roughly:
Beginner (new to LLMs, just play on Kaggle):
start with WFGY 1.0 PDF.
Try to reproduce some of the loops in a simple Notebook, compare baseline vs with-loop behavior.
You don’t need to understand all math, just see if your intuition changes.
Intermediate (you build RAG / tools / agents):
look at WFGY 2.0 Core + 16-problem list in ProblemMap.
Use it as a checklist for failure modes when your system behaves strange.
Advanced (you enjoy breaking frameworks):
download the WFGY 3.0 Singularity Demo TXT and let an LLM run run → go.
Try to make it collapse, find contradictions, or show where the structure fails.
I did not create a new “experimental repo” for 3.0.
I put it directly in the same main repo which already has around 1.3k stars.
So for me, all my past “credit” is now sitting on top of this TXT.
Why I post this on Kaggle
Kaggle is one of the easiest places to:
- spin up a small Notebook,
- call an LLM endpoint,
- visualize results and share with others.
So if anyone here wants to:
- reproduce some of the 1.0 behavior,
- turn the 2.0 16-problem list into your own eval notebook,
- or benchmark your favorite model on the 3.0 TXT flow,
I think Kaggle is actually a very natural playground.
If you feel this direction is interesting, feel free to fork / star the repo.
If it feels suspicious or too ambitious, you can simply treat it as a test object and try to break it.
For me, the goal is not that everybody believes WFGY.
The goal is: after enough public experiments, whatever survives inside WFGY 3.0 is something that really earned its place.
GitHub (main repo):
https://github.com/onestardao/WFGY

Top comments (0)