WFGY 1.0 3.0: from simple PDF for beginners to a TXT stress test for LLMs

#programming #beginners #ai #tutorial

Hi all,

I want to share a small story behind my WFGY framework, from 1.0 to 3.0. Some people maybe saw WFGY before on GitHub / Dev, but this is a more clear version for this community.

The idea is simple:

WFGY 1.0 is very beginner friendly, just a PDF you can read and test with any LLM.
WFGY 2.0 is more for RAG / vector DB / agent debugging.
WFGY 3.0 is a TXT “singularity demo” with 131 S-class tests, more crazy, but still just text.

WFGY 1.0 – good entry point for LLM beginners

WFGY 1.0 started as around 30 pages PDF called “All Principles Return to One”.

It treats an LLM like a system that can “self-heal” using only text:
four modules (BBMC, BBPF, BBCR, BBAM) run as a loop on top of the model, no weight change, no fine-tune, only prompt-level structure.

We tested 10 benchmarks (MMLU, GSM8K, BBH, MathBench, TruthfulQA, …). Very rough numbers:

MMLU: baseline around 68.2% → with WFGY 1.0 around 91.4%
GSM8K: baseline around 45.3% → with WFGY 1.0 around 84.0%
mean time-to-failure in long runs: roughly ×3.6

For beginners, 1.0 is probably the easiest place to start:

you just download the PDF,
open a Kaggle Notebook with any LLM API or local model,
copy some of the loop structure and prompts,
and see how the behavior change by yourself.

No special library, no heavy code.
It is more like “prompt engineering with a serious framework”, and you can play slowly.

WFGY 2.0 – Core + 16-problem checklist for RAG / agents

WFGY 2.0 moved from theory PDF into something that can sit inside real projects.

Two key parts:

The core is compressed into one tension metric delta_s = 1 − cos(I, G) with four zones: safe / transit / risk / danger. (I = intention, G = generated behavior.)
On this, I built a ProblemMap with a 16-problem list for common AI engineering pain: RAG retrieval failure, vector store fragmentation, prompt injection, wrong deployment order, etc.

Many engineers use this 16-problem list like a debugging checklist:

when your RAG or agent looks weird,
you match it to one of the 16 problems,
then apply the suggested fix / guardrail.

If you build chatbots, assistants or pipelines on Kaggle (or anywhere), WFGY 2.0 is the part that maps most directly to your daily pain.

WFGY 3.0 – Singularity Demo as a TXT pack (more advanced)

Now the new part.

WFGY 3.0 · Singularity Demo is now online in the same main GitHub repo. This time it is not a PDF, but a TXT pack designed for LLMs to read directly.

Very conservative description:

it packages a “Tension Universe / BlackHole” layer as 131 S-class problems
it is still only text: no code, no external calls
it is meant as a public stress test to see how far this framework can go, across many domains

For Kaggle users, you can treat 3.0 like a “text-only test lab”:

download the TXT,
in a Notebook, send the file content to your LLM (any endpoint you like),
then follow the small protocol:
- type run
- it will show a menu
- choose go
let the LLM run the short demo and just watch how it behaves

How I suggest to start (beginner / intermediate / advanced)

Very roughly:

Beginner (new to LLMs, just play on Kaggle):

start with WFGY 1.0 PDF.

Try to reproduce some of the loops in a simple Notebook, compare baseline vs with-loop behavior.

You don’t need to understand all math, just see if your intuition changes.

Intermediate (you build RAG / tools / agents):

look at WFGY 2.0 Core + 16-problem list in ProblemMap.

Use it as a checklist for failure modes when your system behaves strange.

Advanced (you enjoy breaking frameworks):

download the WFGY 3.0 Singularity Demo TXT and let an LLM run run → go.

Try to make it collapse, find contradictions, or show where the structure fails.

I did not create a new “experimental repo” for 3.0.
I put it directly in the same main repo which already has around 1.3k stars.
So for me, all my past “credit” is now sitting on top of this TXT.

Why I post this on Kaggle

Kaggle is one of the easiest places to:

spin up a small Notebook,
call an LLM endpoint,
visualize results and share with others.

So if anyone here wants to:

reproduce some of the 1.0 behavior,
turn the 2.0 16-problem list into your own eval notebook,
or benchmark your favorite model on the 3.0 TXT flow,

I think Kaggle is actually a very natural playground.

If you feel this direction is interesting, feel free to fork / star the repo.
If it feels suspicious or too ambitious, you can simply treat it as a test object and try to break it.

For me, the goal is not that everybody believes WFGY.
The goal is: after enough public experiments, whatever survives inside WFGY 3.0 is something that really earned its place.

GitHub (main repo):

https://github.com/onestardao/WFGY