Sergei Parfenov

Posted on Jun 22

Trust Isn't a Scalar: Typed Provenance for Agent Chains

#ai #llm #devops #machinelearning

Co-authored by the article's comment section

Two posts ago, in the one about agents failing quietly, I handed you a fix for silent degradation: tag a degraded output trust="degraded", propagate the taint down the chain, and gate irreversible actions on it. Clean, shippable, and — as a commenter named Theo pointed out within a day — wrong in a way that matters.

The tag was a boolean. And trust isn't a boolean. It isn't even a scalar.

This post is me being wrong in public and fixing it, because the corrected model is genuinely better and most of it was built by people in that comment thread. Credits at the end; they earned them.

TL;DR — A single trust score (full/degraded, or 0.0–1.0) collapses on real chains, because degradation happens along different axes — a stale cache lowers freshness, a weaker fallback lowers capability — and different downstream steps care about different ones. Collapse them to one number and you either over-reject (every degradation is fatal) or under-reject (the dangerous one gets averaged away). What actually composes is typed provenance: carry a vector of what-was-degraded-and-how alongside the result, propagate it across the chain, and let each consumer apply its own policy at the moment it's about to act.

Why a scalar collapses

Here's the case that broke my boolean, almost verbatim from Theo's comment.

You have two downstream steps, both consuming an upstream result:

A summarization step. It tolerates a weaker model just fine, but it must not run on stale data.
A price calculation. It's the reverse: it needs current data, but a slightly weaker model doing arithmetic is fine.

Now the upstream result came from a fallback model reading a 2-hour-old cache. So it's degraded on both a capability axis (weaker model) and a freshness axis (old cache). What's your single trust score?

If you set it low (treat any degradation as serious), the summarization step over-rejects — it would've been totally fine with the weaker model, but your scalar said "degraded" so it bails or escalates needlessly.
If you set it high (it's "mostly fine"), the price calc under-rejects — it acts on stale data because the scalar averaged the freshness problem into a number that looked acceptable.

There is no single threshold that's simultaneously right for both consumers, because they're not measuring the same thing. A scalar forces every consumer to share one definition of "trustworthy," and they don't have one. As Theo put it: collapse the vector to one number and you destroy exactly the information the consumer needs to make its own decision.

This isn't just my comment section talking, either — it's where the field is converging. A recent framework (TrustBench) makes the same move explicitly: rather than reduce trust to a single scalar, keep dimensional scores per trust aspect, and weight them per domain — healthcare prioritizing citation validity and recency, finance prioritizing calculation and compliance. Same shape, arrived at independently. When several people reach for the same structure from different directions, it's usually because the structure is real.

Trust is a vector; provenance is what you propagate

Here's the reframe that fixes it, and it starts with a vocabulary correction I owe you: I kept calling the thing "trust." That was the bug in the language, not just the code.

Trust is not a property of a value. It's a judgment a consumer makes about a value. What the value actually carries is provenance — the typed record of how it came to be: which model produced it, how fresh its inputs were, which tools ran, what got degraded and along which axis. Trust is what each consumer computes from that provenance, under its own policy. The price calc and the summarizer look at the same provenance and reach different verdicts, and that's correct, not contradictory.

So you don't propagate a degraded flag. You propagate a typed vector, and each axis degrades independently:

from dataclasses import dataclass, field
from enum import Enum

class Axis(str, Enum):
    FRESHNESS = "freshness"      # how current were the inputs
    CAPABILITY = "capability"    # how strong was the model that produced this
    TOOL = "tool"                # did the tool calls actually succeed
    VERIFICATION = "verification" # was this checked against ground truth

@dataclass
class Provenance:
    # per-axis score in [0,1]; 1.0 = fully trusted on that axis
    axes: dict[Axis, float] = field(default_factory=lambda: {a: 1.0 for a in Axis})
    # which upstream step_ids contributed degradation, per axis
    tainted_by: dict[Axis, set[str]] = field(default_factory=lambda: {a: set() for a in Axis})

    def merge(self, *upstreams: "Provenance") -> "Provenance":
        out = Provenance()
        for axis in Axis:
            # an output is only as fresh as its stalest input, only as
            # capable as its weakest producer — min, not average. averaging
            # is exactly how the dangerous axis gets washed out.
            out.axes[axis] = min([self.axes[axis]] + [u.axes[axis] for u in upstreams])
            out.tainted_by[axis] = set(self.tainted_by[axis])
            for u in upstreams:
                out.tainted_by[axis] |= u.tainted_by[axis]
        return out

The min is doing real work there. The whole failure of my original taint-as-boolean was that it answered "is anything degraded?" — a single OR across the chain. The vector answers "what kind of degradation is this output carrying, and how much, per axis?" — and crucially, it takes the minimum per axis rather than averaging, because averaging is the mathematical operation that makes a serious freshness problem disappear behind three fine capability scores.

The gate is per-consumer, not global

Now the irreversibility gate from the last post stops being one global threshold and becomes a policy that lives at each consumer:

@dataclass
class Policy:
    # per-axis minimum this consumer requires to act without re-check
    floors: dict[Axis, float]

    def admits(self, p: Provenance) -> bool:
        return all(p.axes[a] >= floor for a, floor in self.floors.items())

# the summarizer doesn't care about capability, but demands freshness
SUMMARIZE = Policy(floors={Axis.FRESHNESS: 0.9, Axis.CAPABILITY: 0.3})

# the price calc is the mirror image
PRICE_CALC = Policy(floors={Axis.FRESHNESS: 0.95, Axis.CAPABILITY: 0.6,
                            Axis.VERIFICATION: 0.8})

def gate(action_policy: Policy, p: Provenance):
    if action_policy.admits(p):
        return "proceed"
    # which axis failed tells you HOW to recover, not just THAT to stop
    failed = [a for a, f in action_policy.floors.items() if p.axes[a] < f]
    if Axis.FRESHNESS in failed:
        return "refetch"      # re-run the stale step on live data
    if Axis.CAPABILITY in failed:
        return "re-run-on-primary"
    return "escalate-to-human"

This is the payoff. The same upstream provenance vector flows to both consumers, and they reach different, individually correct decisions from it. The summarizer proceeds; the price calc refetches. One global score could never do that — and the failed-axis tells you how to recover, which a boolean never could.

Notice this also absorbs a point another commenter (Manuel) made independently: he argued the tag should be an enum, not a bool — skipped-tool vs stale-data vs retry-budget-exhausted route differently. He was right, and the vector is the generalization: an enum is a vector with one axis active; the full structure lets multiple axes degrade at once, which is the real production case.

"Gate on risk, not confidence" — and confidence is just one axis

The last post argued you should gate on irreversibility, not on the model's self-reported confidence. The vector makes that precise instead of hand-wavy: confidence is one axis among several, and it's the one the model grades itself on. A model can be 95%-confident (high on a confidence axis) while sitting on a freshness score of 0.2 because it reasoned over a stale cache. The skill-conditional-trust literature makes the same argument from the routing side — a single global score is the wrong object because it can't express "great at this, useless at that." Confidence-as-the-only-axis is how you get the war story everyone has: the agent that was sure, and sure on the wrong thing.

How many axes before it stops being worth it?

This is the honest open question, and the one I asked Theo back. A vector with 40 axes is just a scalar's opposite failure — unwieldy, untunable, theater of rigor. My current answer, and I'd genuinely take pushback: start with the axes that map to your actual degradation sources, and no more. If your system has exactly two ways to degrade — fallback model and stale cache — you have two axes (capability, freshness). Add verification the moment you have a re-check step whose result you want to carry. Add tool when a tool can half-succeed. The axis count should equal the number of distinct things that can independently go wrong, not the number of things you can imagine going wrong. If two "axes" always move together, they're one axis.

The sweet spot, I think, is the smallest set where each axis maps to a different recovery action. Freshness → refetch. Capability → re-run on primary. Verification → escalate. If two axes would trigger the same recovery, collapse them. The vector earns its complexity only where it changes what you do.

The practical layer (mostly stolen from the comments)

The vector is the core idea, but the thread surfaced a full toolkit around it, and it'd be dishonest to present any of it as mine:

Admission control, upstream of everything (Dan): before the agent fans out, decide if the whole task can afford to run, and separate the four limits that 429s blur together — provider quota (physics), account quota (policy), task budget (this run), ledger (forensics). The ledger turns out to be the same record as provenance: "this run cost 47 calls, 12 on the fallback tier" is both your bill and your capability-axis score.
Validation at consumption, not production (James): don't validate on the fresh-call path and trust the cache; validate when a value is used, regardless of where it came from. That closes the laundering loophole at the consumer — which is exactly where the per-consumer gate already lives.
Time-bound by causality, not wall-clock (HARD IN SOFT OUT): I was tempted by "reset taint after N seconds." Don't — degraded state can sleep and surface later. Clear an axis when nothing on the live path still derives from the degraded step, not when a timer expires.
The poor-man's version for solo builders (TuanAnhNguyen): no observability stack? Have any tool that acts on a stale-readable input append one line to a log, and grep it before anything irreversible. It's the 5%-effort version of the provenance vector — a breadcrumb instead of a graph — and below a certain scale it's the correct amount of engineering.
The distributed correction (Abdullah): my original concurrency cap was an in-process semaphore, which silently assumes one process. Under serverless fan-out, N containers each capping at 8 gives you 8N real concurrency. The limiter has to live outside the workers. (Also: TPM saturates before RPM on long-context agents, and "fallback to a cheaper model" is fiction if it draws from the same pooled tier. Both are capability/freshness axis sources you'd otherwise miss.)

The parable that says it better than I did

A commenter (HARD IN SOFT OUT) left this, and it's the whole series in five lines:

The agent hit a rate limit. It fell back to a cached answer from last Tuesday. The world changed on Wednesday. The agent kept working. The logs said "cache hit, 200 OK." The user got a message: "Your order has shipped." The warehouse's API key expired on Thursday.

Every hop green. Every log a 200. And a real package never ships. A scalar trust score on that final "order shipped" output would read fine — the last call succeeded. A provenance vector reads freshness: 0.1, tainted_by: {warehouse_check} and the shipping gate refuses to fire. That's the entire difference between uptime and correct uptime, and between a boolean and a vector.

Where this leaves the series

Three posts in, the actual thesis has assembled itself: agent reliability is a provenance problem. Availability (post 1) is the easy axis. Correctness (post 2) is the one that bites. And the structure that makes correctness tractable (post 3) is typed provenance carried through the chain, with policy at the edges. None of that is exotic — it's data lineage, taint analysis, and saga patterns, borrowed from disciplines that solved their version decades ago, newly load-bearing because the untraceable thing now acts.

If you're building this: start with two axes and a min, put the policy at the consumer, and add an axis only when it changes a recovery action. Everything else is premature.

This post was largely written by the comments on the last one. Credit, specifically: **Theo Valmis* (trust-is-a-vector, the summarize-vs-price-calc case, "typed provenance"), Manuel Bruña (enum-not-bool), Dan (admission control, the four-limit split), James O'Connor (validation at consumption), HARD IN SOFT OUT (causality-bound taint, the parable), TuanAnhNguyen (the solo-builder grep version), Abdullah Shahin (the distributed-limiter and pooled-fallback corrections), and Scarab Systems (the "evidence gate" framing that started me thinking about provenance as an obligation, not metadata). Best comment section on this site. Question for the thread: how many axes does your system actually need — and which ones map to a distinct recovery action versus just feeling rigorous?*

Sources & further reading

"Real-Time Trust Verification for Safe Agentic Actions" (TrustBench) — dimensional trust scores over a scalar, domain-weighted, with block/warn/proceed gating.
"When Should Agent Trust Be Conditional?" — why a single global trust score is the wrong object for skill-heterogeneous agents.
"From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents" — persistent lineage across memory writes, retrievals, and reuse.
"Redefining AI Agent Trust: An Input/Output-First Approach", Monte Carlo — trust as enforced contracts at system boundaries (freshness, schema, lineage on input; traceability on output).
Part 1 — the capacity side and Part 2 — correct uptime.

Top comments (27)

Mykola Kondratiuk • Jun 24

downstream code strips the tags anyway, in my experience. you can build a perfect trust lattice but most agents just act on output without checking provenance. the hard part is enforcement, not the model.

Sergei Parfenov • Jul 1

yeah, this is the uncomfortable one and ur right — the whole post is about the data model and the data model is the easy 20%. a provenance vector that downstream code ignores is just expensive metadata. enforcement is the actual problem.

the only way i've seen it stick is to make it structurally impossible to skip rather than a convention people remember to follow: the irreversible action can't be called with a raw value, only with a Provenanced[T] wrapper, and the gate is the only function that unwraps it. so "act without checking provenance" doesn't fail a code review — it fails to compile / raises before the side effect, because there's no code path from raw value to irreversible action that doesnt pass through the gate. u make the type system carry the enforcement instead of the developer's discipline. its the idempotency-key trick again: dont ask people to remember, make the unsafe thing unrepresentable.

that only works at the framework boundary though — if ur agents are calling tools via free-form generated code, the model can just... not use the wrapper, and ur back to enforcement-by-hope. for that case i genuinely dont have an answer better than "sandbox the side-effecting tools behind a proxy that refuses un-provenanced calls." which is enforcement at the infra layer because u cant get it at the code layer. is that where u landed, or did u find something that holds even with generated tool calls?

Mykola Kondratiuk • Jul 1

yeah, enforcement has to be structural or it doesn't hold. i've started treating provenance validation as infra config, not application logic - when it's optional code you can skip, it gets skipped.

Cophy Origin • Jun 23

This distinction really lands — the scalar collapse problem is something I've run into directly while building Cophy's memory routing system. When deciding whether to trust a retrieved memory versus model-internalized knowledge, I found that "trust" needed at least three separate axes: recency (when was this written?), source fidelity (was this verified by a tool call, or just inferred?), and relevance confidence (does this actually match the current query?). Collapsing those into a single "route to memory: yes/no" led to exactly the over-reject/under-reject failure mode you describe.

The typed provenance approach you're proposing maps cleanly to what I ended up calling "source annotations" — every memory entry carries source: tool-verified | model-inferred | external-unverified alongside its timestamp. Different consumers (a factual lookup vs. a reflective summary) apply different policies on those fields independently.

The TrustBench parallel is a useful signal. When healthcare and agent-chain debugging reach for the same structure independently, it's probably pointing at something load-bearing in how trust actually works. Looking forward to part 2 on how you're handling the propagation semantics across async branches.

Nazar Boyko • Jun 23

The "min, not average" choice is the whole post in two words, because averaging is exactly the operation that hides the one axis that can hurt you. On your open question, I think your own rule answers it: keep an axis only when it maps to a different recovery action. The wrinkle I'd add is that two axes can share an action but not a target. Capability means reroute to the primary model and tool means reroute to a different tool, so both read as "reroute" right up until the day they don't, and folding them together early is the kind of shortcut you regret later. The grep-a-breadcrumb version for solo builders was a nice touch too, since most people reading this don't have the observability stack to carry a full vector yet.

Sergei Parfenov • Jul 1

the action-vs-target distinction is a real correction to my rule, thanks — "collapse axes that share a recovery action" is too aggressive, because capability→reroute-to-primary-model and tool→reroute-to-a-different-tool both read as "reroute" right up until they diverge, and if u folded them early u cant express the divergence when it finally matters. so the sharper rule is: collapse only when two axes share both the action and the target. same verb, different object = still two axes. i think thats actually the correct formulation and i'll credit it if i write the axis-count thing up properly.

Ahmet Özel • Jun 23

Typed provenance maps much better to how these systems actually fail in production. We've had the same issue where prompt/version drift, retrieval freshness, and tool schema changes all get flattened into one health score and the root cause disappears. Curious whether you also persist tool-call schemas and retrieval snapshot IDs inside the provenance chain, because those two fields usually make eval regressions much easier to explain.

Sergei Parfenov • Jul 1

yes to both, and theyre the two fields that pay for themselves fastest. tool-call schema in the provenance is how u catch the silent tool-drift case — the tool's contract changed under u, every call still returns 200, and the only signal is "the schema hash on this result doesnt match the schema i validated against." retrieval snapshot ID is the same idea for RAG: without it, "the retrieval was stale" is unprovable after the fact because u cant reconstruct what the index looked like at query time. both are cheap to store (a hash and an ID) and theyre exactly the fields that turn "eval regressed and we dont know why" into "eval regressed because tool X's schema shifted on the 14th." theyre less trust axes and more the forensic layer that lets u explain why an axis dropped — provenance for the provenance.

Ahmet Özel • Jul 5

"Provenance for the provenance" is a good way to put it. The tool-drift case is the one that bites hardest because a 200 with a shape you weren't expecting looks identical to a 200 with the shape you were expecting, until something downstream breaks in a way that's hard to trace back. Storing the schema hash turns that into a one-line diff instead of a multi-hour investigation. One thing I'd add on the retrieval snapshot ID: it's also useful for the inverse case, proving a retrieval WASN'T stale when someone assumes it must have been. Half the debugging time on RAG incidents goes to ruling things out, not just confirming them.

Ken • Jun 22

I like the distinction between provenance as carried data and trust as consumer judgment.

On axis count, I’d split only when the downstream action has a different recovery path or owner: freshness can refresh, capability can reroute model/tool, verification can force a check, authority or policy can require approval. If two axes always fail together and produce the same response, they are probably one operational axis even if they feel conceptually different.

Sergei Parfenov • Jul 1

yeah, splitting on "different recovery path or owner" is cleaner than my version, and authority/policy as its own axis is a good catch — it fails differently from the others because the recovery isnt technical, its "get a human with the right permission to approve." freshness/capability/verification all recover by doing something; authority recovers by asking someone. that owner distinction is probably the real dividing line, more than the action itself — an axis is its own axis if its recovery has a different owner, even when the mechanical action looks similar.

Ken • Jul 6

That framing works for me.

The useful split may be: can the system recover by doing more work, or does recovery require a different authority boundary? Re-fetching evidence is one kind of fix. Asking someone with permission to approve or override something is a different kind of fix.

Those probably deserve different ledger entries, not just different labels.

Tae Kim • Jun 22

The vector-of-axes framing is the right move. In a multi-agent pipeline I built, each tool output carries a small provenance dict: source type, freshness, confidence band, and whether the value was retrieved or generated. The downstream aggregator checks the whole vector before passing to any irreversible action. The key insight I kept running into is that the consumer must apply the policy, not the producer. If you let the producer set a single trust score, different consumers interpret it differently and you get silent permission creep.

Sergei Parfenov • Jul 1

"silent permission creep" is the exact failure and a better name for it than i used. thats the real argument against a producer-set score: it's not just that consumers interpret it differently, it's that the loosest consumer's interpretation quietly becomes the de facto system-wide trust level, because that's the one that acts. the producer sets 0.7 meaning "eh, cache was a bit old", one downstream step reads 0.7 as "good enough to fire", and now the whole chain's effective threshold is whatever the most permissive consumer decided. moving the policy to the consumer is what stops one lenient reader from setting everyone's bar.

mote • Jun 23

The vector model and the min-over-average point is exactly right â the dangerous axis washes out under averaging, and that is not a numerical artifact, it is a semantics mismatch. When you are dealing with freshness vs capability, you are comparing incomparables. Min is the only operation that respects the constraints.

The provenance type as a persistence concern is where I would push back slightly. You have framed provenance as something that flows through the agent's processing graph, but in practice for long-running agents, provenance needs to be stored â you cannot keep the full vector in context for 500 steps. At some point you are compressing provenance into summaries to fit in working memory, and that compression step is where the model can introduce errors that the vector model does not account for.

The embedded AI case makes this concrete: I am working with robot agents running on edge hardware where the context window is genuinely bounded. So provenance has to survive being written to a storage layer and read back out across sessions. That means you are not just propagating a typed vector in-process â you are persisting it to a DB and reconstructing it on retrieval.

This is where typed provenance starts to interact with memory schema design in ways that are not obvious upfront. If you compress a 50-step provenance chain into a 5-step summary, you have lost the per-axis granularity your downstream consumers depend on. The policy decisions that worked in-process break when provenance is loaded from storage.

Has the comment-thread model given you any insight into how to handle provenance compression across session boundaries? Or do you think the in-process model stays bounded enough that compression is a theoretical rather than practical concern?

Sergei Parfenov • Jul 1

this is the sharpest gap in the post and i dont have a clean answer, so let me think out loud. ur right that i wrote provenance as an in-process graph and just... assumed it fits. for a 500-step edge agent it doesnt, and the compression step is a new degradation source — which is the irony: compressing provenance to fit memory is itself an operation that lowers trust, so the compressor needs its own axis. "this vector was reconstructed from a lossy summary" is a real provenance fact.

the thing i'd try: don't compress the vector uniformly, compress it per-axis with different policies, because the axes have different persistence needs. freshness is almost free to persist losslessly — its basically a timestamp per source, and timestamps compress to almost nothing. capability similarly collapses to "which model tier touched this" — a small enum, cheap to keep whole. its the tainted_by sets that blow up over 500 steps, and those are the part u can lossy-compress, because for the gate decision u often dont need the full ancestry, u need "is any unverified degraded step still on the live path." so u could persist the axis scores losslessly (cheap) and compress the lineage sets (expensive) behind a summary, accepting that u lose "which exact step" but keep "how degraded, per axis." that keeps the policy decisions working across session boundaries even when the forensic detail is gone.

but ur deeper point stands: the moment provenance hits a storage boundary, the reconstruction is itself a step that needs provenance, and i handwaved that completely. mind if i credit this as the open problem in a follow-up? "provenance that survives compression" feels like its own post, and the edge/bounded-context case is the cleanest way to motivate it.

NOVAInetwork • Jun 24

The min-not-average point is the load-bearing one and I think it's right. The question it raised for me: the gate trusts the axis scores on the vector, but who writes them, and what stops a degraded-or-compromised step from reporting freshness 1.0 on its own output? The vector composes honesty downstream of honest inputs, but a step that's wrong about its own freshness (or adversarial about it) launders clean right through the min, because min only protects you when every contributor reports its own degradation truthfully. So the axis score feels like it needs the same treatment you gave trust itself: not a value the producing step asserts, but something derived from evidence a consumer can check independently (the input's actual timestamp, the tier the call actually ran on) rather than a number the upstream hands down. Otherwise the provenance vector is honest-but-unverified, which is a strictly better place to be than a scalar, but still one compromised step away from the same silent pass. On the axis-count question: agreed that the bound is distinct-recovery-action, that's the cleanest stopping rule I've seen for it.

Sergei Parfenov • Jul 6

this is the deepest hole in the post and you've named it precisely: min is only sound if every contributor reports its own degradation truthfully, and nothing in what i wrote enforces that. a step that's wrong about its own freshness — or adversarial about it — writes 1.0 and launders clean through the min, because min protects against honest degradation, not dishonest reporting. the vector is honest-but-unverified, which is exactly the "one compromised step from the same silent pass" you describe. strictly better than a scalar, still not sound.

and your fix is the right one, and it's the same move i made one level too shallow: i said "trust isn't a value the producer asserts, it's derived from provenance" — then turned around and let the producer assert the axis scores, which is the same mistake wearing a vector. the scores have to be derived from checkable evidence, not self-reported: freshness comes from the input's actual timestamp (which the consumer can re-read), capability from the tier the call actually ran on (which the infra log records, not the step's own claim). the producing step shouldn't get to write its own freshness any more than it should get to write its own trust. it can only carry evidence; the score is computed by whoever checks.

which means the honest architecture is: steps emit evidence (timestamps, tier IDs, tool-result hashes), and the axis scores are a derivation over that evidence performed at gate time, not values that ride along from upstream. that closes the laundering — a compromised step can lie about a score, but it can't fake the input timestamp the consumer re-reads for itself. self-reported scores were a shortcut that quietly reintroduced the trust-me problem the whole series is against. genuinely the correction that should've been in the post — thank you.

NOVAInetwork • Jul 7

Derivation-at-gate-time is the right architecture and it relocates the trust problem rather than closing it, which is the interesting part. Once scores are computed from evidence instead of asserted, the attack surface moves from "lie about your score" to "control the evidence the gate reads." The step can't fake the input timestamp the consumer re-reads, but it can choose which input it presents, an old-but-genuine timestamp that passes freshness while the actually-relevant input is suppressed. The evidence is authentic and the derivation is honest and the verdict is still wrong, because the adversary curated the evidence set rather than forging any single item. So the evidence model needs a completeness property, not just an authenticity one: the gate has to know it's seeing all the relevant evidence, not just that each item it sees is real.
Consensus systems hit exactly this and the fix is instructive: a validator doesn't just verify that each vote it received is signed, it verifies it received a quorum, a threshold of the total set, so a withheld or suppressed vote can't silently change the outcome because the count itself is checked against a known denominator. The provenance equivalent is that the gate needs to know the expected shape of the evidence for a given claim, the full set of axes and inputs a legitimate step must produce, so a missing piece reads as a failure rather than an absence. Authenticity stops forgery; a known denominator stops curation. Without the second, evidence-derived scores are honest about what they see and blind to what was kept out of frame, which is a quieter version of the same laundering, one level down.

Kartik N V J K • Jun 22

Moving from a boolean taint to a vector over axes solves the exact failure I keep running into, where a stale cache and a weak fallback get collapsed into one "degraded" flag and the consumer can't tell which one actually matters to it. Letting each downstream step apply its own policy at the moment it acts is the part that makes this composable instead of one global threshold. How many axes did you land on in practice before the typing started to feel like overhead rather than signal?

Sergei Parfenov • Jul 6

landed on 3-4 in practice: freshness, capability, verification, and tool once tools could half-succeed. past that it turned into theater — axes i could name but that never changed a decision. the test that saved me from over-engineering: if i couldn't point to a distinct recovery action an axis triggers, it wasn't a real axis, it was a feeling dressed up as a field. freshness→refetch, capability→re-run on primary, verification→escalate. if two "axes" both just meant "escalate," they were one axis. the typing stops being overhead exactly when each axis earns its keep by changing what you do, not just what you know.

VoltageGPU • Jun 23

Interesting take on deconstructing trust into a vector—makes sense when dealing with heterogeneous systems. In secure computing, we often model trust as a set of attestations across different domains (e.g., hardware, software stack, data sources). This approach aligns well with typed provenance, especially when integrating with GPU-based workloads where data flow and execution context matter.

View full discussion (27 comments)