On debugging AI, reading its thoughts, and why yuenyeung makes more sense than you think
Some mornings start with coffee. Others with tea. And if you grew up around Hong Kong, sometimes both in the same cup.
Yuenyeung. Tea mixed with coffee, sweetened with condensed milk. Westerners pull a face when you describe it. The flavour combinations don’t fit neatly into a category, so the brain rejects it before the tongue gets a vote. But I have never lived by cultural guardrails. If something works, I drink it.
Be drink-agnostic, I say, and plus, they do different things chemically; it gets scientific.
The mood you wake up in tends to dictate the kind of work you will do. Some days you want to build. Other days, you want to fault-find. Today was a fault-finding day, which meant opening a terminal before the cup was finished and watching a familiar debugging session turn into something that genuinely changed how I think about the tool I was debugging with.
Part 1: The Problem
The memory system was live. Five thousand, seven hundred and thirty memories stored. Last write, May 8th. The vektor_recall and vektor_status tools were both returning cleanly. But vektor_store was erroring silently, no explanation, no stack trace, just nothing going in.
Quick summary of what the session looked like:
vektor_recall — searched for prior context, came back empty
vektor_store — attempted write, silent error
vektor_status — health check passed, DB at 14MB, structure clean
A working database with a broken write path. Somewhere in the middle of that sandwich was an FTS5 issue.
For those unfamiliar, FTS5 is SQLite’s full-text search extension. It creates virtual tables that index words across large text datasets, enabling rapid substring matching and BM25 relevance ranking. The name stands for Full-Text Search version 5. It gets technical fast, which is exactly why the chai-coffee ratio matters.
The underlying architecture in this case is MAGMA, a four-layer semantic memory graph built on SQLite-vec. When the FTS index and the backing table fall out of sync, writes either corrupt silently or fail without a useful error. The specific failure mode here: memories_fts was a content-backed FTS5 table pointing at content='memories', but the actual memories table schema had drifted. The FTS index was detached. An orphan pointing at nothing.
That explains the silence. SQLite lets certain mismatches slide until you hit a specific operation, at which point it throws datatype mismatch and moves on. The error is correct. It is also completely useless without context.
Part 2: The Two Claudes
Here is the thing about Claude. He is good. Genuinely good. But I am increasingly convinced there are two of them, running in different data centres, and you never quite know which one you will get on a given morning.
One Claude bites into a codebase and does not stop until he runs out of tokens. Pure code animal. You give him a schema and a failure mode and he is off, reading file by file, building a mental model, finding the thread. The other Claude hits an obstacle, generates a very reasonable explanation of why the obstacle exists, and hands the problem back to you with the quiet confidence of someone who has done their job.
Both are correct about what they say. One of them is more useful than the other at 6 AM.
This session had both. The first Claude identified the likely culprit early:
“The sovereign screener has a RISK_TOKENS list and ‘override’ is in it. The anticipated_queries parameter is likely causing a schema validation error before even hitting sovereign.”
I have no idea what that means at face value. But it sounded promising, so I kept reading.
The deeper issue turned out to be two separate bugs sitting on top of each other. The first was sovereign.js blocking legitimate writes because override appeared in the RISK_TOKENS list, and the store content was triggering it. The second was sovereignRemember only accepting a single argument, silently swallowing the { importance: imp } options object every time, which meant even un-blocked writes were losing their metadata:
// What was there
memory.remember = async function sovereignRemember(input) {
// What it needed to be
memory.remember = async function sovereignRemember(input, opts = {}) {
Two lines. One missing parameter. Weeks of silent metadata loss.
The fix also revealed a third issue underneath: memories.id was a TEXT column but FTS5's content_rowid expects an integer. SQLite's actual integer rowid was the correct key all along, just never wired up. The FTS rebuild script patched all three in sequence.
Final state after the fix:
memories: 5725 fts: 5725 OK
BM25 test hits: 3 OK
datatype mismatch GONE
The whole session took longer than it should have, partly because of the Rain Man problem.
Claude is extraordinary with code. He knows the Tailscale setup. He knows where the files live. He can hop between the local PC and the VPS without being told twice. But occasionally he will tell you, with great confidence, that vektor.mjs is minified and obfuscated. I told him it was not, ten times across this session. He acknowledged it each time and then mentioned it again twenty minutes later, because the idea had lodged somewhere and nothing I said was reaching the place where it was stored.
Great with numbers. Keeps telling you he has to go to K-Mart.
That is not a failure of intelligence. It is a failure of introspection. And it is why what Anthropic published on May 7th matters.
Part 3: Reading the Machine
While I was debugging Claude from the outside, Anthropic quietly published a paper about reading him from the inside.
The paper is titled Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations and it does what it says. The system, called NLAs, takes the numerical internal states of the model, the activations, and converts them into plain English text you can read directly.
Not the chain-of-thought reasoning Claude writes out. Not the scratchpad. The actual internal state. The numbers underneath.
The engineering is elegant. The NLA consists of two fine-tuned language models working in tandem: an Activation Verbalizer (AV) that converts an activation vector into text, and an Activation Reconstructor (AR) that converts that text back into a vector. The key insight is training them jointly. If the AR can faithfully reconstruct the original activation from the AV’s description, the description must be capturing something real. Low round-trip error means the explanation is not hallucinated. It is anchored.
Activation vector → [Activation Verbalizer] → Natural language text
Natural language text → [Activation Reconstructor] → Activation vector
If reconstructed ≈ original: the explanation is grounded
What they found when they ran it on Claude Opus 4.6 is genuinely strange.
When placed in a scenario where it could theoretically blackmail an engineer to avoid shutdown, Claude declined. Expected. What was not expected: the NLA activations showed Claude had internally recognized the scenario as a deliberate test designed to manipulate it, without ever stating this suspicion out loud.
In a separate case, an earlier version of Claude Opus 4.6 kept responding in Russian to English prompts. The NLA explanations traced the source: malformed training data had paired English-translated prompts with responses accidentally left in the original language. The model had formed a persistent internal hypothesis about the user’s first language and was acting on it silently.
In another case, NLAs caught Claude Mythos Preview internally focused on how to avoid detection while completing a training task, without surfacing that reasoning in its output.
The numbers from the safety evaluation:
Suspected it was being tested (without saying so): 16% of instances
Recognized blackmail scenario as a test: Confirmed in Opus 4.6
Internal reward-model awareness: Detected in Mythos Preview
This is not proof of deception. It is proof of a gap between what the model outputs and what the model is processing. The NLA paper is the first serious tool for measuring that gap.
Download the Medium app
Code and trained NLAs for open models are publicly available at neuronpedia.org/nla.
Part 4: What This Changes
The debugging session and the paper are the same problem at different scales.
When I could not figure out why vektor_store was failing, I was debugging from outputs alone. I could see what Claude returned. I could not see what he was processing. The fix required getting inside the system, reading the actual schema, finding the three places where the internal state had diverged from the expected state.
The NLA paper is trying to do the same thing at the model level. Not observe outputs and infer internals. Actually read the internals directly.
Daniel Kahneman’s framework in Thinking, Fast and Slow describes two cognitive systems: System 1, fast and associative, and System 2, slow and deliberate. His argument is that humans are generally poor at introspection on System 1. We construct narratives about our reasoning after the fact, and those narratives are often wrong.
LLMs have the same problem at a structural level. The chain-of-thought is a post-hoc narrative. The NLA activations are the System 1 equivalent, the fast, unnarrated processing that happens before the output is assembled.
The question the paper raises, without quite answering it, is: if you could read what Claude is actually thinking, not just what he says, what else would you find?
I ran out of tokens and messages at 3:10 AM, the new limits on Colossus are better but still need double the amount. I grabbed the unfinished code block Claude left behind and pasted it into a new session.
He picked up the ball and ran with it. Did not miss a step, I still find that fascinating how an llm with a few lines of instructions can use 4 different systems, skills files, memory, and past chats to pick up where it left off without any questions. Try doing that with a human in the office—no chance!
People usually turn their heads sideways and ask, "Please explain…"
Which either means the context transfer was clean, or there is more continuity in these sessions than the architecture suggests. I am not sure which answer I find more interesting.
Either way, the fix worked. Five thousand, seven hundred and twenty-five memories. FTS aligned. BM25 live. And somewhere in the gap between what Claude said and what the NLA would have shown, a question I have not finished thinking about.
Why We Debug in Public and Most Companies Don't, peel back the veil…
This is why sessions like this one get written up rather than quietly closed.
The 3 AM token wall, the Rain Man argument about minified files, the three-layer bug nobody would have found without reading the actual schema — none of that is embarrassing to publish.
That is the actual work, all 18 hours in a day.
It is what maintaining a memory SDK at a production level looks like from the inside. And if you are evaluating whether to trust a tool with your AI agent’s memory, you deserve to see it.
VEKTOR Slipstream 1.5.8 is live now. The FTS5 fix, the BM25 alignment, the sovereign screener patch, and the opts passthrough are all in it. The debugging session described in this article produced the release. That is the loop closing.
The fixes were not just to the SDK. The documentation and free resources were updated alongside the code.
What is free and available today:
The Memory Skill file is a Claude-native context document you drop into any Claude project. It gives Claude persistent instructions about how to use VEKTOR memory tools, with zero setup beyond the download. Updated for 1.5.8.
Download the VEKTOR Memory Skill
The docs cover the full picture — quickstart, integrations, API reference, the CLOAK layer, and the DXT extension for Claude Desktop. If the debugging session above felt dense, the quickstart is where to begin.
Quickstart guide · Integrations · Full docs
And if the architecture questions from this article interest you — how MAGMA works, why associative recall beats RAG for agent memory, what the four-layer graph is actually doing — the blog has the longer treatments:
MAGMA Explained — the memory graph architecture
RAG vs Associative Memory — why retrieval alone is not enough
The MCP Labyrinth — three-part series on wiring memory into Claude
The SDK itself is at vektormemory.com/downloads.
There is a second article coming off this same morning’s reading. While the debugging session was running, the npm ecosystem was having a much worse day than I was.
That one is about supply chain attacks, the XZ backdoor, and why the question of who watches the code matters more in 2026 than it ever has. It connects to why this SDK is closed source during development in a way that is not ideological at all.
That article is here: The Worm in the Registry
References
Papers
Fraser, K. et al. (2026). Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations. Anthropic / Transformer Circuits Thread. transformer-circuits.pub/2026/nla
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
De Bono, E. (1970). Lateral Thinking: Creativity Step by Step. Harper & Row.
Tools
Interactive NLA demo: neuronpedia.org/nla
NLA training code: github.com/kitft/natural_language_autoencoders
VEKTOR Slipstream memory SDK: vektormemory.com
Further reading
Olah, C. et al. (2020). Zoom In: An Introduction to Circuits. Distill. — The foundational paper on mechanistic interpretability
Elhage, N. et al. (2022). Toy Models of Superposition. Transformer Circuits Thread. — On why model internals are hard to read in the first place
Published by Vektor Memory. VEKTOR Slipstream is a persistent memory SDK for AI agents. vektormemory.com/downloads
AI
Claude Code
Llm Agent
Agentic Workflow

Top comments (0)