DEV Community

Cover image for My tests were green. My code was wrong. The spec was 3,000 years old.
Jingcheng-xie
Jingcheng-xie

Posted on

My tests were green. My code was wrong. The spec was 3,000 years old.

Most of the code we write can be checked against something. A tax calculator has the tax code. A JSON parser has a spec. A physics sim has reality. You write a test, you assert the expected value, and the "expected value" comes from somewhere authoritative.

But what do you do when there is no authority? When the "spec" is 3000 years old, written in classical Chinese, has multiple competing schools that flatly contradict each other, and the human experts you'd ask for the "right answer" disagree among themselves — and are sometimes just wrong?

That was my problem. I set out to build a deterministic engine for I Ching (六爻 / Liù Yáo) hexagram casting — the part that takes a moment in time and a coin toss and mechanically derives a fully annotated chart. Not the interpretation. Just the hard, mechanical facts: which hexagram, its palace, the stem-branch assignments, the "six relatives," the moving lines, the void days.

And I hit the wall every developer in an under-specified domain hits: the self-validation paradox.

The self-validation paradox

Here's the trap. I write the engine. Then I write tests for it. But I am the one who decided what the engine should output — so my tests just encode the same assumptions my code does. If I misunderstood a rule, my code is wrong and my test asserting that wrong behavior is green. The bug and its "proof of correctness" share a single brain: mine.

   My understanding of the rules
            │
      ┌─────┴─────┐
      ▼           ▼
   the code    the test
      │           │
      └─────┬─────┘
            ▼
     both agree → green ✓   (even when both are wrong)
Enter fullscreen mode Exit fullscreen mode

You can't test your way out of a misconception using tests you wrote from the same misconception.

The usual escape is "ask an expert." But for this domain that fails on every axis I care about as an engineer:

  • Not reproducible. An expert's ruling is a one-off. Six months later I can't re-run it.
  • Not scalable. There are 64 hexagrams × 6 possible moving lines × 60 day-pillars × … the combinatorial space is huge. No human is checking thousands of charts.
  • Not infallible. Experts disagree, and any single one can be wrong. "Trust me" is not a test suite.

I wanted correctness to be a reproducible engineering fact, not an appeal to authority. So I borrowed a technique from compilers and numerical code: differential testing against an independent oracle.

Differential testing: diff against someone who solved it independently

The idea is simple. If I can't trust my own answer key, I find a second, completely independent implementation of the same rules and diff against it on a large sample of inputs. Any disagreement is a bug — in mine or in the reference — and either way it's something I must run down and reproduce.

The key word is independent. If I test one of my modules against another of my modules, they share my misconceptions and the diff proves nothing. The oracle has to come from a different author, a different codebase, ideally a different community that arrived at the rules on their own.

For hexagram casting I picked najia (MIT-licensed) as the primary oracle:

  1. License is clean — MIT, safe to depend on.
  2. It's programmable — I can drive it from a script over thousands of combinations, not click through a UI.
  3. It's community-validated — other people already rely on it.

Then I stacked a second independent source on top: yigram-najia-rules (also MIT, a machine-readable rule table). A field is only trusted when all three agree — mine, najia, and yigram. Three sources sharply lowers the odds that "two implementations share the same bug."

Here's the actual test. It samples 200 random hexagrams across random dates and compares every field of the chart:

def test_differential_vs_najia():
    """200 random hexagrams × random dates, field-by-field agreement with najia."""
    rng = random.Random(20260616)  # fixed seed → reproducible
    mismatches = []

    for _ in range(200):
        params = [rng.choice([1, 2, 3, 4]) for _ in range(6)]  # the six lines
        date = rng.choice(_DATES)
        d = _najia_chart(params, date)
        bits = d["mark"]

        # Feed najia's OWN day-stem into my engine → isolate casting logic
        # from calendar logic, so this test compares ONLY the charting rules.
        day_gan = GAN.index(d["lunar"]["gz"]["day"][0])
        chart = cast_chart(bits, day_gan)

        checks = {
            "hexagram name": (chart.name, d["name"]),
            "palace":        (chart.palace, d["gong"]),
            "stem-branch":   ([y.najia for y in chart.yaos], list(get_najia(bits))),
            "six relatives": ([y.liuqin for y in chart.yaos], list(d["qin6"])),
            "six gods":      ([y.liushen for y in chart.yaos], list(d["god6"])),
            "self line":     (chart.shi_pos, d["shiy"][0]),
            "response line": (chart.ying_pos, d["shiy"][1]),
        }
        for field, (mine, theirs) in checks.items():
            if mine != theirs:
                mismatches.append(f"{bits} {field}: mine={mine} najia={theirs}")

    assert not mismatches, "differential mismatch:\n" + "\n".join(mismatches[:20])
Enter fullscreen mode Exit fullscreen mode

Two details worth stealing:

  • Fixed seed (random.Random(20260616)). The sample is random but the run is deterministic. A failure names a specific hexagram-and-date I can reproduce forever. Random-but-reproducible is the sweet spot: broad coverage, zero flakiness.
  • Isolating the layers. Notice I feed najia's own computed day-stem into my engine before comparing. That deliberately takes the calendar out of the equation so this test measures only the charting rules. The calendar gets its own differential test (below). One oracle per concern; never let two unknowns hide each other.

The payoff: correctness stopped being "I'm pretty sure I read the book right" and became a green check in CI that anyone can re-run. If a contributor changes a lookup table and breaks a rule, 200 charts light up red with the exact hexagram that broke.

The foundation had to be diffed too: the calendar

There's a layer below charting that is even less forgiving. Everything in this system is derived from the four pillars — the stem-branch coordinates of the casting moment (year, month, day, hour). If the day pillar is off by one, every downstream fact — the line strengths, the void days, the month authority — is silently wrong. The whole chart is garbage and nothing throws an error.

And the bugs all live at the boundaries:

  • Months are divided by solar terms, not the lunar 1st. Whether a moment belongs to "this month" or "next" depends on the exact instant the sun crosses a solar-term boundary.
  • The day changes at 23:00, not midnight. In this tradition the next day's pillar begins at the start of the 子 (zǐ) hour — 11 PM — not at 00:00.
  • Cross-timezone casting. Someone casting from California has a wall-clock time that must be reconciled with the absolute instant of a solar term.

Zero-tolerance foundations get the same treatment: I compute with sxtwl (a high-precision astronomical ephemeris) as the primary, and cross-validate against lunar-python — a totally separate implementation. Solar-term crossings, the 23:00 day flip, and timezone edge cases each have unit tests that diff the two libraries. Disagreement surfaces immediately.

The principle underneath: only cross-validatable things get into the core

Once you build around differential testing, it starts making design decisions for you. It became the single rule the whole engine is organized around:

Anything that enters the deterministic core must be independently cross-validatable. Anything unfalsifiable stays out.

I Ching practice has a category called 神煞 (shén shà) — auspicious/inauspicious "stars." Different schools define them differently and the rules openly contradict. There is no single truth and no independent implementation to diff against. So they don't go in the core. Not because they're "wrong" — but because I have no way to prove the code computes them correctly, and unfalsifiable output has no business sitting next to facts I can guarantee.

Same reasoning killed "true solar time" correction in the core (which longitude? which equation-of-time formula? no single answer → can't diff), and it's why the engine commits to one school's rules instead of mixing several (mixing = contradictory rules in one engine = unfalsifiable by construction).

The stuff I deliberately leave out is a feature, not a gap. It's the same discipline that keeps a type system sound: if you can't prove it, don't let it in — layer it out, mark it optional, or hand it to a different layer.

The takeaway (works far outside divination)

Strip away the I Ching and this is a pattern for any domain where you can't fully trust your own answer key — spec-compliant parsers, tax and payroll engines, protocol implementations, financial calculators, anything porting a fuzzy standard into code:

  1. Name the self-validation paradox. Tests you wrote encode the bugs you have. Own it.
  2. Find an independent oracle. A second implementation by someone who never talked to you. Diff every field, not just the final answer.
  3. Two oracles beat one. Three-way agreement crushes shared-bug risk.
  4. Random but seeded. Broad coverage, reproducible failures, zero flake.
  5. Isolate layers. One oracle per concern so two unknowns can't hide each other.
  6. Let it gate your design. If a feature can't be cross-validated, it doesn't belong in the part of the system you call "correct."

The engine is open source (Apache-2.0), the differential tests run in CI, and you can clone it and re-run every claim in this post:

github.com/yaomancy/liuyao-engine

GitHub logo yaomancy / liuyao-engine

Deterministic I Ching (Liu Yao / 六爻) hexagram engine — cross-validated, Apache-2.0

liuyao-engine

确定性六爻装卦/断法引擎 · Deterministic I Ching (Liù Yáo) hexagram engine by Yaomancy · Apache-2.0

CI Python 3.11+ License: Apache-2.0

把"起卦时刻"和"六爻"换算成一份结构化卦盘事实:干支四柱 → 装卦(纳甲 / 六亲 / 六神 / 世应 / 伏神)→ 旺衰骨架 → 用神。纯规则、零网络、可逐项回归验证。

法度:装卦遵《卜筮正宗》、断法遵《增删卜易》;神煞不进核心逻辑;不混派。引擎只负责一切"硬事实",不含任何 AI 解读。


English quick start

liuyao-engine is a deterministic engine for Liù Yáo (六爻) / coin-oracle I Ching divination. Given a cast and a timestamp it returns a fully assembled, machine-readable hexagram chart — Heavenly-Stem/Earthly-Branch four pillars, najia stem-branch attribution, six-relations, six-spirits, self/other lines, hidden lines, and wàng-shuāi (strength) — as plain Python data. No AI, no network, no I/O. Its lookup tables are cross-validated against najia and yigram-najia-rules; the calendar is double-checked against sxtwl + lunar-python. Classical terms stay in Chinese (with pinyin) — see the glossary; interpretive meanings are intentionally not translated.

安装 / Install

pip install git+https://github.com/yaomancy/liuyao-engine
Enter fullscreen mode Exit fullscreen mode

依赖:仅 sxtwl(历法)。Python ≥ 3.11。

30 秒上手 / Killer example

from liuyao import
Enter fullscreen mode Exit fullscreen mode

It's the deterministic core behind yaomancy.com. I pulled it out and open-sourced it precisely because the interesting part isn't the mysticism — it's the engineering discipline of proving a computation correct when nobody can hand you the answer key.

If you've solved the "no ground truth" problem in your own domain, I'd genuinely like to hear how — the comments are open.

Top comments (0)