Vitalii Cherepanov

Posted on May 9

The right of an AI agent to stay silent

#software #ai #llm

Part 3 of 3 — "Memory for AI agents"
Why the right metric isn't accuracy — it's zero confidently-wrong actions

Article

Picture two scenarios.

In the first — a senior cardiac surgeon looks at a scan and says: "I don't know. There are two competing hypotheses here, the symptoms overlap. We need additional tests — these three specifically, and a CT with contrast. Until I see those, I won't commit to an answer I'd defend."

In the second — a bright-eyed intern confidently delivers a diagnosis in thirty seconds, leaning on a similar case from last week's textbook. Confident. Crisp. No doubt.

Which one would you trust to operate on your mother?

Right now, every AI agent we ship is the second doctor. Confident. Fast. Never says "I don't know." And that's exactly why you can't trust them with anything more painful than rewriting a README.

Today — how to change that. Not algorithmically. Architecturally.

The rotten metric that poisoned us all

There's an unspoken industry consensus that I think is a disaster: we measure models and systems by accuracy — the percentage of correct answers on a benchmark.

GPT-4 hits 86% on MMLU. Claude — 88%. Gemini — 90%. Better, better, even better. The number goes up.

What that number doesn't show: the remaining 10–14%. These aren't "answers the model didn't give." They're confidently generated wrong answers, visually indistinguishable from correct ones. The model has no warning light for "I'm not sure here." It generates everything with the same textual confidence.

When you use such a model to write notes — fine. When you use it for production code, medical decisions, legal opinions, financial transactions — 10% confident hallucinations means 10% of cases where the system is lying to you with a straight face.

The right metric for production AI sounds different:

0% confidently-wrong actions at an acceptable abstain rate.

Not "percentage of correct answers." But "percentage of wrong actions" — zero. And separately — abstain rate: how often the system honestly says "I don't know, I need data / verification / clarification." Zero wrong actions plus 30% abstain is ten times more production-ready than 90% accuracy with 10% confident hallucinations.

Notice: I didn't say "0% wrong answers." I said "0% wrong **actions." The distinction matters. An answer is words. An action is a commit, a transaction, a diagnosis, an API call, a change in production. Words can be reread and discarded. An action has already happened.

And that separation between "answer" and "action" — that's what's architecturally absent from modern AI agents.

Abstain as a first-class outcome

In Part 2 of this series I laid out seven principles of real memory, and the second was strict mode. Quick recap: before a fact lands in prompt context, it passes through a gate — source, confidence, temporal validity, no unresolved contradictions. If no fact made it through — the system returns abstain = true, with an explicit reason.

There's a detail I want to underline separately. Abstain is not an error. It's a result. Every bit as first-class as "answer" or "action." If your AI has exactly two possible outcomes — "answered" and "got it wrong" — it has no architectural place for an honest "I don't know." Which means it's going to make things up.

In a sane system, there are at least four outcomes:

answer — sufficient evidence, answer given, action executed
clarification request — partial evidence, needs user input
abstain → brain task — insufficient evidence, recorded as a backlog task with an explicit data request
escalation — there's a contradiction that requires human review

And the last three aren't fallbacks. Not "when everything went wrong." They're full, expected, designed-in paths.

When I ask braincore to find a decision about auth flow on a project we've been working on for three months — it finds it. When I ask about a project I just started, where nothing's recorded yet — it doesn't make things up. It says: "I have no evidence on this question. Created a brain task: collect decisions on auth, source — our current design doc, owner — you. Once you fill it in, ask again."

This is not a bug. It's the right behavior. Notice what happened: the system didn't block me. Didn't say "error, no data." It turned the not-knowing into a task, which now lives in its backlog and will periodically remind itself.

Self-Tasking. A brain with a backlog, not a passive search engine

The thing that scares me most about modern "AI agents" is that they're passive. They wait for a prompt. Every. Single. Time. Remember nothing between sessions. Have no internal backlog. Don't realize they have unresolved questions.

That's not an "agent." That's a function in agent costume. A function takes input, returns output. An agent has goals, state, and its own tasks between requests.

In a real cognitive runtime, there's a separate entity — brain tasks. They get spawned automatically:

truth.contradiction — a contradiction found in the knowledge graph → task to resolve
truth.staleness — a fact hasn't been confirmed in a long time → task to verify
strict.abstain — the system refused to answer → task to find evidence
selflearn.skill_scorecard — a skill started failing often → task to repair
specs.evidence_gap — a requirement without coverage proof → task to gather
tests.failing_coverage — tests aren't passing → task to fix
learning.failure_pattern — a recurring error pattern detected → task to generalize into a rule

Each task prioritizes itself by a simple formula:

priority = f(urgency, impact, confidence, risk, effort, dependency_readiness)

And at any moment, the user can ask: "show the next five tasks, why they matter, which I can safely do now, which need my input." That's not the same chat where you start with a blank slate every time. It's a working environment with its own memory of what's not done.

This is a flip in framing. Not "user shows up and asks, agent answers." But "agent runs in the background, accumulates open threads, and tells you — here's what matters now."

Show me a RAG stack that does this. Spoiler: there isn't one. Because RAG is a search engine, not an agent. And when someone says "our RAG-based AI has agency" — that's marketing fiction. Agency requires internal state, goals, a backlog, and self-assessment. RAG has none of these.

Cognitive Runtime > Model Size

The last myth to dismantle.

"When GPT-5 / Claude 5 / Gemini 3 ships — memory will solve itself." No. It won't. Ever.

Memory is not a property of the model. It's a property of the system the model runs in. The analogy:

A human has good memory not because neurons compute fast.
A human has good memory because there's a hippocampus, a neocortex, sleep-time consolidation, emotional gating through the amygdala, and an architectural separation between working / episodic / semantic / procedural memory.
It's infrastructure, not compute power.

Make the LLM ten times bigger — memory still doesn't appear. Build a runtime around the existing LLM that implements the seven principles from Part 2 plus abstain plus self-tasking — and a weak local model in that runtime starts doing things GPT-5 with RAG-memory architecturally cannot.

Not because it's smarter. But because the runtime does for it what it shouldn't have to do itself: remembers, verifies, abstains, tasks itself.

This is, by the way, the only meaningful path forward in a world where foundation models are commodity. When everyone has roughly equivalent Claude/GPT/Gemini — competitive advantage can only come from what's around the model. Domain-specific cognitive runtime. Project-specific memory. Team-specific rules.

And this bet is also about privacy. About data sovereignty. About the fact that your project's memory is your capital, and handing it to a third-party vector DB to pay monthly rent on it is a strategic mistake you'll only notice three years in, when you can't leave anymore.

That's why, incidentally, braincore is a local Go binary that works by default without OpenAI and without Anthropic. Not because I'm against them (I'm a paying customer of both). But because the architecturally correct path is a runtime where the model is a swappable component, not the center of gravity.

A checklist for anyone building AI products right now

If you've read the whole series and you're thinking "okay, agreed, what do I do Monday morning?" — here are ten items you can start moving on regardless of whether you use braincore or not.

Drop the word "memory" from your stack if what you have is RAG. Call it retrieval or search — instantly removes 80% of inflated expectations.
Introduce truth_status for every fact. Minimum: hypothesis | confirmed | deprecated. Disallow confirmed without source_ref.
Introduce valid_from / valid_until. Any fact without temporal validity is a hypothesis, not a fact.
Make abstain a first-class outcome. Not "when things go wrong" — but as one of four valid results.
Distinguish staging | working | consolidated | archived. Don't dump everything into one collection.
Negative memory. What broke — record it explicitly, with a link to the failing test or commit.
Entity disambiguation. Never auto-merge entities at low confidence. Create an ambiguity record instead.
Causal chains for decisions. Not "text" — problem → alternatives → decision → reasoning → outcome.
Local where possible. Project memory is your capital.
The metric is not "percentage of correct answers." It's 0% wrong actions at an acceptable abstain rate.

Not all at once. Pick two or three and start. In a month, you'll have an AI system you can trust more than most that exist.

Epilogue. Cognitive hygiene for the AI industry

I'm tired of the word "memory" getting slapped on every vector database with embeddings. It's a devaluation of the term — like calling a one-column text VARCHAR table a knowledge base. Technically — yes. Substantively — no.

Memory is:

structure, not a flat list
knowing the boundary, not confident bullshit
causal chains, not chunks
entity-aware, not string-aware
temporal-aware, not "created yesterday, valid forever"
self-correcting, not self-deceiving
governed, not "dump whatever, sort later"
abstain-capable, not "always answers"

If your "AI with memory" doesn't do at least half of those — your AI doesn't have memory. It has search results. These aren't the same thing.

One last thing. I'm not telling you to throw out RAG. RAG is an excellent tool for its class of tasks (find me the paragraph about X in 100 documents). I'm telling you to stop calling RAG memory and start building real cognitive runtimes — slower, more disciplined, with explicit gates and explicit abstain. It's the only path to AI systems you can trust with anything more important than rewriting a README.

If you're a startup with "our AI has long-term memory on a vector database" in your pitch deck — close that slide, redo it, and in two years you'll thank yourself.

If you're a developer fighting with an agent that forgets what you said yesterday — that's not the agent's fault. It's the fault of whoever sold you a search engine wrapped as a brain.

A good AI agent isn't the one that always answers. A good AI agent is the one that never takes a confidently wrong action. Between those two sentences lies the entire chasm separating 2024's AI tooling from AI tooling that will be trustworthy in 2027.

I've picked my side of the chasm. Building braincore — open, Apache-2.0, in the repo. If you recognize yourself in this series — we're in the same boat. If something works differently in your stack — tell me how, I genuinely want to know.

The one thing you can't do is stay silent.

TL;DR of the whole series:

Part 1: RAG = Ctrl+F with embeddings. It's search, not memory. Mem0/Letta/Zep — RAG in wrappers. 1M context is RAM, not disk.

Part 2: Real memory = seven principles in combination. Atomic units + lifecycle + truth_status + temporal + causal chains + AST identity + internal git + memory scoring + negative memory. Each exists in isolation. Combined — different product.

Part 3: The metric for production AI isn't accuracy — it's 0% confidently-wrong actions. Abstain is a first-class outcome, not an error. Cognitive runtime > model size.

If your AI "remembers" via vector_db.query(top_k=5) — it has dementia disguised as confidence. Fix the architecture, not the model.