DEV Community

PSBigBig OneStarDao
PSBigBig OneStarDao

Posted on

WFGY 1.0 3.0: from simple PDF for beginners to a TXT stress test for LLMs

Hi all,

I want to share a small story behind my WFGY framework, from 1.0 to 3.0. Some people maybe saw WFGY before on GitHub / Dev, but this is a more clear version for this community.

The idea is simple:

  • WFGY 1.0 is very beginner friendly, just a PDF you can read and test with any LLM.
  • WFGY 2.0 is more for RAG / vector DB / agent debugging.
  • WFGY 3.0 is a TXT “singularity demo” with 131 S-class tests, more crazy, but still just text.

WFGY 1.0 – good entry point for LLM beginners

WFGY 1.0 started as around 30 pages PDF called “All Principles Return to One”.

It treats an LLM like a system that can “self-heal” using only text:
four modules (BBMC, BBPF, BBCR, BBAM) run as a loop on top of the model, no weight change, no fine-tune, only prompt-level structure.

We tested 10 benchmarks (MMLU, GSM8K, BBH, MathBench, TruthfulQA, …). Very rough numbers:

  • MMLU: baseline around 68.2% → with WFGY 1.0 around 91.4%
  • GSM8K: baseline around 45.3% → with WFGY 1.0 around 84.0%
  • mean time-to-failure in long runs: roughly ×3.6

For beginners, 1.0 is probably the easiest place to start:

  • you just download the PDF,
  • open a Kaggle Notebook with any LLM API or local model,
  • copy some of the loop structure and prompts,
  • and see how the behavior change by yourself.

No special library, no heavy code.
It is more like “prompt engineering with a serious framework”, and you can play slowly.

WFGY 2.0 – Core + 16-problem checklist for RAG / agents

WFGY 2.0 moved from theory PDF into something that can sit inside real projects.

Two key parts:

  • The core is compressed into one tension metric delta_s = 1 − cos(I, G) with four zones: safe / transit / risk / danger. (I = intention, G = generated behavior.)
  • On this, I built a ProblemMap with a 16-problem list for common AI engineering pain: RAG retrieval failure, vector store fragmentation, prompt injection, wrong deployment order, etc.

Many engineers use this 16-problem list like a debugging checklist:

  • when your RAG or agent looks weird,
  • you match it to one of the 16 problems,
  • then apply the suggested fix / guardrail.

If you build chatbots, assistants or pipelines on Kaggle (or anywhere), WFGY 2.0 is the part that maps most directly to your daily pain.

WFGY 3.0 – Singularity Demo as a TXT pack (more advanced)

Now the new part.

WFGY 3.0 · Singularity Demo is now online in the same main GitHub repo. This time it is not a PDF, but a TXT pack designed for LLMs to read directly.

Very conservative description:

  • it packages a “Tension Universe / BlackHole” layer as 131 S-class problems
  • it is still only text: no code, no external calls
  • it is meant as a public stress test to see how far this framework can go, across many domains

For Kaggle users, you can treat 3.0 like a “text-only test lab”:

  • download the TXT,
  • in a Notebook, send the file content to your LLM (any endpoint you like),
  • then follow the small protocol:
    • type run
    • it will show a menu
    • choose go
  • let the LLM run the short demo and just watch how it behaves

How I suggest to start (beginner / intermediate / advanced)

Very roughly:

Beginner (new to LLMs, just play on Kaggle):

start with WFGY 1.0 PDF.

Try to reproduce some of the loops in a simple Notebook, compare baseline vs with-loop behavior.

You don’t need to understand all math, just see if your intuition changes.

Intermediate (you build RAG / tools / agents):

look at WFGY 2.0 Core + 16-problem list in ProblemMap.

Use it as a checklist for failure modes when your system behaves strange.

Advanced (you enjoy breaking frameworks):

download the WFGY 3.0 Singularity Demo TXT and let an LLM run run → go.

Try to make it collapse, find contradictions, or show where the structure fails.

I did not create a new “experimental repo” for 3.0.
I put it directly in the same main repo which already has around 1.3k stars.
So for me, all my past “credit” is now sitting on top of this TXT.

Why I post this on Kaggle

Kaggle is one of the easiest places to:

  • spin up a small Notebook,
  • call an LLM endpoint,
  • visualize results and share with others.

So if anyone here wants to:

  • reproduce some of the 1.0 behavior,
  • turn the 2.0 16-problem list into your own eval notebook,
  • or benchmark your favorite model on the 3.0 TXT flow,

I think Kaggle is actually a very natural playground.

If you feel this direction is interesting, feel free to fork / star the repo.
If it feels suspicious or too ambitious, you can simply treat it as a test object and try to break it.

For me, the goal is not that everybody believes WFGY.
The goal is: after enough public experiments, whatever survives inside WFGY 3.0 is something that really earned its place.

GitHub (main repo):

https://github.com/onestardao/WFGY

Top comments (0)