Synthadoc: Staleness Detection, Full Audit Trails, and Four Export Formats - No Extra LLM Calls

#ai #automation #knowledgemanagement #llm

There's a category of problem that only shows up after you've been running an automated knowledge system for a while. The first month feels like magic - pages compile themselves, citations appear, everything is fresh. Three months later, you open a page about a library that shipped three breaking versions since the source was last ingested. The page looks perfectly healthy. The confidence is "high." The lint passed. And yet, everything in it is quietly wrong.

Static knowledge bases have no vocabulary for "this was true." Synthadoc v0.6.0 gives your wiki one.

Synthadoc release v0.6.0 ships two features that change how a wiki ages: a five-state page lifecycle machine that tracks content freshness with a permanent audit trail, and a wiki export system that serializes not just content but provenance, history, and cost, in four machine-readable formats, with zero additional LLM calls.

The 5-State Page Lifecycle

The core idea is simple: every page has a status that reflects what the system knows about it right now, not just what it says. That status moves through five states based on signals from ingest, lint, and the source files themselves.

Automatic transitions (system-triggered):

Transition	Trigger	Who
`→ draft`	New page created via ingest	IngestAgent
`draft → active`	Lint passes all structural and consistency checks	LintAgent
`active → stale`	SHA-256 hash of source file has changed since last ingest	LintAgent
`stale → draft`	Source re-ingested with`--force`; page updated	IngestAgent
`draft / active / stale → contradicted`	New source conflicts with this page; status set directly, bypasses transition API	IngestAgent

Manual transitions (user CLI commands):

Command	Transition	Description
`lifecycle activate <slug>`	`draft → active`	Promote without waiting for the next lint run
`lifecycle archive <slug>`	`draft / active / contradicted / stale → archived`	Retire the page; it's kept for reference
`lifecycle restore <slug>`	`archived → draft`	Re-admit the page; re-enters the lint queue

Note: active → stale and stale → draft have no user-facing CLI command, they are exclusively system-triggered by lint and re-ingest respectively. The only path out of contradicted is archiving it; you cannot promote a contradicted page directly to active.

Every single transition - automatic or manual - is permanently written to the audit database with a timestamp, the triggering agent or user, and a reason string. That's the part that actually matters. The state tells you where a page is. The log tells you how it got there and when someone last looked at it.

# Check the full history of a page
synthadoc lifecycle log alan-turing

Slug                      From           To             By           Timestamp              Reason
----------------------------------------------------------------------------------------------------
alan-turing               null           draft          ingest       2026-04-12T09:14:22    initial ingest
alan-turing               draft          active         lint         2026-04-12T09:31:07    all checks passed
alan-turing               active         stale          lint         2026-05-03T02:00:11    source hash mismatch
alan-turing               stale          draft          ingest       2026-05-03T08:22:55    re-ingest of stale page
alan-turing               draft          active         lint         2026-05-03T08:45:02    all checks passed

If you prefer a visual view, the same full cross-wiki audit trail is available in Obsidian under Synthadoc: Manage Page Lifecycle → Audit Log. Every transition shows colour-coded From/To state badges, the triggering agent or user, the timestamp, and the reason string - searchable by slug, filterable by state, paginated:

For a fleet-level view, synthadoc status gives a live summary across all five states, including pages sitting in candidates, along with an action hint for anything that needs attention:

synthadoc status

Wiki:         history-of-computing
Pages:        42
Jobs pending: 0
Jobs total:   187

Page lifecycle:
  active         38
  draft           2  <- run `synthadoc lint run` to promote
  draft (staged)  1  <- promote from candidates/ first, then lint
  stale           1  <- re-ingest needed
  contradicted    0
  archived        1

Two things worth knowing about what these numbers mean. First, Pages: 42 at the top counts only pages that have been admitted into wiki/ - pages still quarantined in wiki/candidates/ are excluded from that total. Second, draft and draft (staged) are distinct rows: draft is pages already inside wiki/ waiting for their first lint pass; draft (staged) is pages physically quarantined in wiki/candidates/ , and they haven't been promoted yet, have no lifecycle state, and are invisible to every part of the system until a human explicitly promotes them. The lifecycle section only shows draft (staged) when the count is greater than zero, so on a wiki with staging turned off you'll never see that row. The action hints tell you exactly what to do next for each group: run lint for drafts, re-ingest for stale pages, review and archive for contradictions.

If you prefer to manage lifecycle states visually, the Obsidian plugin surfaces the same data in Synthadoc: Manage Page Lifecycle → Current States. The table is sortable and filterable by state, shows the last transition timestamp and who triggered it, and gives you a one-click archive button per page. The contradicted chip makes it easy to find the pages that need attention first:

Candidates Staging: A Quality Gate Before Lifecycle Begins

The lifecycle machine handles what happens after a page enters the wiki. Candidates staging handles whether it enters at all.

Where a new page lands depends entirely on the staging policy configured for that wiki. There are three options:

off (default): every new page goes straight into wiki/ as draft. Staging is not involved.
all: every new page goes to wiki/candidates/ regardless of confidence. Nothing is admitted automatically - you review and promote everything.
threshold: IngestAgent checks the page's confidence rating against your configured minimum. Pages that meet or exceed it go directly into wiki/; pages that fall below it go to wiki/candidates/ for review.

wiki/candidates/ is a holding area excluded from search, context packs, and export. No downstream consumer sees a candidate. The page exists on disk, but it hasn't been admitted into the lifecycle yet, it has no audit log entry and doesn't appear in synthadoc status counts.

Here's how the three paths look end-to-end under the threshold policy:

The key design decision here: staging and lifecycle are orthogonal systems that compose cleanly. Staging decides admission. Lifecycle decides state after admission. A page in wiki/candidates/ has no lifecycle state yet, it's not in the audit log, it doesn't count in synthadoc status, and it doesn't appear in any export. The moment you promote it, it enters the lifecycle as draft and the lint queue picks it up on the next run.

This matters for teams that need a human gate on automated ingestion. Nightly ingest jobs run at 2AM, pull new sources, compile pages. They all land in candidates. A person reviews the list in the morning, promotes what looks right, discards what doesn't. The wiki only grows with reviewed content.

# Enable threshold staging: auto-promote high-confidence, hold everything else
synthadoc staging policy threshold --min-confidence high

# Morning review
synthadoc candidates list

Candidates (3):
  machine-learning-fundamentals    confidence: medium   ingested: 2026-05-31
  attention-mechanism              confidence: low      ingested: 2026-05-31
  transformer-architecture         confidence: medium   ingested: 2026-05-31

synthadoc candidates promote transformer-architecture
synthadoc candidates discard attention-mechanism

Wiki Export: Four Formats, Zero LLM Calls

Export was designed around one constraint: once your wiki is compiled, you shouldn't need to spend more API budget to serialize it. All four formats are computed entirely from the stored wiki state - no prompts, no completions, no waiting.

The --status flag is what makes export practically useful. When you're feeding a downstream LLM, you probably only want active pages — the ones that passed lint and haven't gone stale:

synthadoc export --format llms.txt --status active
synthadoc export --format json --status active --output exports/wiki.json

The --status contradicted flag is genuinely useful for forensics — you can export just the pages with conflicts and analyse them without touching the rest of the wiki.

The JSON format is the one worth drawing attention to specifically. Most wiki exports give you a flat document dump. This one gives you provenance at the sentence level (claims[] maps each paragraph to the exact source file and line range that generated it), the complete state transition history (lifecycle_history[]), and the per-page API cost to compile it. If you're building downstream tooling or reporting on knowledge quality, these three fields eliminate an entire layer of instrumentation you'd otherwise have to build yourself.

{
  "slug": "alan-turing",
  "status": "active",
  "ingest_cost_usd": 0.0012,
  "claims": [
    {
      "text": "Turing proposed the imitation game in 1950...",
      "source": "raw_sources/turing-biography.md",
      "lines": [42, 48]
    }
  ],
  "lifecycle_history": [
    { "from": null, "to": "draft", "by": "ingest", "ts": "2026-04-12T09:14:22" },
    { "from": "draft", "to": "active", "by": "lint", "ts": "2026-04-12T09:31:07" }
  ]
}

What Makes This Different

Most LLM wiki tools treat knowledge as append-only. You ingest, you query. There's no concept of a page going stale, no audit trail of who reviewed what and when, and no way to know that the page you're reading was compiled from a source that changed three months ago. They're effectively write-once databases with a chat interface on top.

Synthadoc's lifecycle machine makes the wiki temporally aware. A SHA-256 hash is stored for every source at ingest time. When lint runs (nightly, typically), it compares current hashes against stored ones. A changed source triggers an automatic active → stale transition with a timestamp. You know exactly which pages need attention and when they last didn't.

The other thing that separates Synthadoc architecturally is that it's not a retrieval pipeline with a generation step, it's a compilation pipeline. Every page is a synthesized artifact, not a retrieved chunk. That's why the JSON export can include ingest_cost_usd per page: because each page has a discrete compilation history, not a query-time cost that varies every time someone asks a question.

The combination of lifecycle tracking and export also enables something practical for teams: you can run synthadoc export --format llms.txt --status active as the input to a downstream agent, and you know exactly what you're giving it. No stale content. No contradicted pages. Just the subset of the wiki that the system has marked as reviewed and consistent.

Quick Demo

The quickest way to see the lifecycle machine in action is step 8 of the quick-start guide: Step 8 — Manage Page Lifecycle.

Export is step 21: Step 21 — Export Your Wiki.

Both steps work with the history-of-computing demo wiki, so you can run the full thing locally in about ten minutes against your selected LLM provider:

git clone https://github.com/axoviq-ai/synthadoc.git
pip3 install -e ".[dev]"
synthadoc install history-of-computing --target ~/wikis --demo
synthadoc plugin install history-of-computing