<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vitalii Cherepanov</title>
    <description>The latest articles on DEV Community by Vitalii Cherepanov (@vbcherepanov).</description>
    <link>https://dev.to/vbcherepanov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3890596%2F3b9f0f21-90ef-4b37-bdbe-8c831c5a39e6.png</url>
      <title>DEV Community: Vitalii Cherepanov</title>
      <link>https://dev.to/vbcherepanov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vbcherepanov"/>
    <language>en</language>
    <item>
      <title>What 16 Parallel Claude Agents Built Around Themselves: Deconstructing Anthropic's C Compiler Experiment</title>
      <dc:creator>Vitalii Cherepanov</dc:creator>
      <pubDate>Sat, 09 May 2026 14:57:30 +0000</pubDate>
      <link>https://dev.to/vbcherepanov/what-16-parallel-claude-agents-built-around-themselves-deconstructing-anthropics-c-compiler-18p</link>
      <guid>https://dev.to/vbcherepanov/what-16-parallel-claude-agents-built-around-themselves-deconstructing-anthropics-c-compiler-18p</guid>
      <description>&lt;p&gt;On February 5, 2026, Nicholas Carlini from Anthropic &lt;a href="https://www.anthropic.com/engineering/building-c-compiler" rel="noopener noreferrer"&gt;published a piece&lt;/a&gt; about an experiment that runs significantly ahead of what most of us are doing with LLM agents today. Sixteen parallel instances of Claude Opus 4.6, two weeks of work, ~2,000 Claude Code sessions, a budget around $20,000. The output: 100,000 lines of a C compiler in Rust that builds Linux 6.9 on x86, ARM, and RISC-V; passes 99% of GCC's torture test suite; compiles PostgreSQL, SQLite, FFmpeg, Redis, and QEMU; and runs Doom. The &lt;a href="https://github.com/anthropics/claudes-c-compiler" rel="noopener noreferrer"&gt;repository is open&lt;/a&gt; and anyone can read it and try it themselves.&lt;/p&gt;

&lt;p&gt;It's serious engineering work, and the article itself is a great read for anyone thinking about autonomous agents in production. Carlini is honest about what worked and what didn't, walks through five concrete lessons from designing the harness, and shares numbers and metrics. This is exactly the kind of writeup the industry needs more of — a first-hand account of what long autonomous runs actually look like.&lt;/p&gt;

&lt;p&gt;Headlines split into two camps. "AI replaced programmers" on one side. "It's just a demo" on the other. Both miss what's actually interesting.&lt;/p&gt;

&lt;p&gt;If you read the article carefully, what Carlini is documenting is not "AI writes a compiler." He's documenting &lt;strong&gt;how much infrastructure had to be built around the agents because there isn't yet any infrastructure between the agents themselves in 2026&lt;/strong&gt;. Lockfiles in a shared directory as a sync mechanism. READMEs that the agent writes to itself. GCC pressed into service as a known-good reference oracle. A Ralph-loop wrapped around Docker for indefinite autonomy. Each of these is an answer to a concrete problem that today simply has &lt;strong&gt;nowhere to be pushed into a standard layer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And that's the article's real value. Not as an "AI demo," but as a &lt;strong&gt;detailed map of missing primitives&lt;/strong&gt;, drawn by someone who built workarounds for them by hand. I've been working on these primitives for the past few months, and Carlini's writeup is a great excuse to talk through what the next generation of agent teams actually needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Every session starts with amnesia
&lt;/h2&gt;

&lt;p&gt;Carlini built a harness that runs Claude in an infinite loop — when the agent finishes one task, it picks up the next. Architecturally this is the familiar "Ralph-loop" pattern: a &lt;code&gt;while true&lt;/code&gt; cycle in a bash script, wrapped in Docker for safety. In one of the runs, Claude accidentally killed itself with &lt;code&gt;pkill -9 bash&lt;/code&gt;, which Carlini notes as an amusing side effect.&lt;/p&gt;

&lt;p&gt;The crucial detail is that each of those ~2,000 launches started &lt;strong&gt;in a fresh Docker container with empty context&lt;/strong&gt;. No memory between sessions. Every agent figured out from scratch: what is this repo, what's already done, what's the status of tasks, what's been tried and failed.&lt;/p&gt;

&lt;p&gt;Carlini's workaround was to instruct Claude itself to maintain extensive READMEs and progress files, updated frequently. When the agent gets stuck on a bug, it also keeps a running doc of failed approaches and remaining tasks.&lt;/p&gt;

&lt;p&gt;This works within the bounds of current tooling — and that's its value. But if you look at scaling, two architectural points start to creak.&lt;/p&gt;

&lt;p&gt;First, a text file isn't structured. If you want to ask "what were the three most recent bugs I fixed in the parser area and how did they end?", you only have &lt;code&gt;grep&lt;/code&gt; and regular expressions. On a small project that's tolerable. On 100,000 lines of code and 2,000 sessions, it becomes a bottleneck.&lt;/p&gt;

&lt;p&gt;Second, more subtle: each agent maintains these files for itself. They live in the shared git repository, but there's no mechanism that says "before you take task X, look at what the other 16 agents wrote about this area in the past 6 hours." Each agent writes its own README, merges others' edits, and hopes things converge.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;first generation of shared memory&lt;/strong&gt; — implemented as plain text because no more convenient primitive has become standard yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lockfiles as coordination
&lt;/h2&gt;

&lt;p&gt;Parallelism is implemented minimally. Each agent runs in its own Docker container; a shared bare git repo holds state. Task coordination happens through lockfiles: an agent writes a file like &lt;code&gt;current_tasks/parse_if_statement.txt&lt;/code&gt;, does a &lt;code&gt;git push&lt;/code&gt;, and thereby "claims" the task. If two agents try to take the same one, git synchronization forces the second to pick something else. Done — delete the lockfile.&lt;/p&gt;

&lt;p&gt;Carlini states the current state of the system plainly: &lt;em&gt;"no other method for communication between agents... I don't use an orchestration agent."&lt;/em&gt; No mechanism for agents to "ask each other." No central coordination. Each Claude decides for itself what to do next — usually "the next obvious problem."&lt;/p&gt;

&lt;p&gt;The lockfile here does exactly one thing — it works as a &lt;strong&gt;mutex&lt;/strong&gt;, protecting against parallel claims on a single task. That's valuable. But it doesn't solve the other problem: two agents working on different tasks in &lt;strong&gt;the same code area&lt;/strong&gt; can write conflicting code under different task names. That's exactly what happened to the Linux kernel in the experiment — agents converged on the same bug, fixed it differently, overwrote each other's edits, and parallelism temporarily stopped paying off.&lt;/p&gt;

&lt;p&gt;Carlini's solution was a separate test harness using GCC as a &lt;strong&gt;known-good compiler oracle&lt;/strong&gt;: most of the kernel gets compiled with GCC, and a random subset of files goes through Claude's compiler. If the kernel doesn't boot, the bug is somewhere in Claude's subset, and you can keep narrowing it down. It's a clever and elegant idea, and it worked exactly as intended.&lt;/p&gt;

&lt;p&gt;It's worth noting the bounds in which this works. The GCC oracle is a precise solution for &lt;strong&gt;this specific task&lt;/strong&gt;, because the task has three convenient properties: there exists a ready-made reference compiler for the same spec, the task decomposes at the level of individual files, and the outcome is binary (boots or doesn't).&lt;/p&gt;

&lt;p&gt;In most real projects — product development, legacy refactoring, ML pipelines, mobile applications — these conveniences don't exist. There's no ready known-good for comparison. There's no natural file-level decomposition. Outcomes aren't binary. Which means &lt;strong&gt;the GCC-oracle technique can't be generalized as a primitive&lt;/strong&gt; — it works where it works, and doesn't exist where it doesn't.&lt;/p&gt;

&lt;p&gt;Taken as a whole, Carlini's toolkit lays out neatly along two axes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What agent teams need&lt;/th&gt;
&lt;th&gt;What's in the experiment&lt;/th&gt;
&lt;th&gt;Nature of the solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent discovery&lt;/td&gt;
&lt;td&gt;hardcoded number of containers&lt;/td&gt;
&lt;td&gt;hardcoded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inter-agent communication&lt;/td&gt;
&lt;td&gt;lockfile via git push&lt;/td&gt;
&lt;td&gt;mutex without messaging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task delegation&lt;/td&gt;
&lt;td&gt;next-most-obvious from queue&lt;/td&gt;
&lt;td&gt;no routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared state / memory&lt;/td&gt;
&lt;td&gt;README + progress files&lt;/td&gt;
&lt;td&gt;plain text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Causal history&lt;/td&gt;
&lt;td&gt;running doc of failed approaches&lt;/td&gt;
&lt;td&gt;personal log&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verification&lt;/td&gt;
&lt;td&gt;GCC oracle&lt;/td&gt;
&lt;td&gt;task-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are &lt;strong&gt;two independent axes of the problem&lt;/strong&gt;: communication (how agents talk to each other) and memory (what they remember between sessions and whether they share it). These axes require different primitives and different solutions. And on each, the industry is currently converging on standards and open-source implementations.&lt;/p&gt;

&lt;p&gt;What follows is what's available on each axis today.&lt;/p&gt;

&lt;h2&gt;
  
  
  The communication axis: A2A protocol and a2abridge
&lt;/h2&gt;

&lt;p&gt;Communication has moved fast and has already arrived at a mature standard. In April 2025, Google &lt;a href="https://a2a-protocol.org/latest/" rel="noopener noreferrer"&gt;opened the A2A protocol&lt;/a&gt; — Agent-to-Agent. In August 2025, IBM's ACP &lt;a href="https://lfaidata.foundation/communityblog/2025/08/29/acp-joins-forces-with-a2a-under-the-linux-foundations-lf-ai-data/" rel="noopener noreferrer"&gt;merged into A2A under the Linux Foundation&lt;/a&gt;, and by April 2026 the spec is at version 1.2, supported by 150+ organizations (Microsoft, AWS, Salesforce, SAP, ServiceNow, IBM among them), and natively built into Google ADK, LangGraph, CrewAI, LlamaIndex Agents, Semantic Kernel, and AutoGen. A2A has effectively &lt;strong&gt;won the protocol war&lt;/strong&gt;. The spec is deliberately minimal: an &lt;strong&gt;Agent Card&lt;/strong&gt; is a JSON description of an agent's capabilities (what it does, what endpoint to hit). A &lt;strong&gt;Task&lt;/strong&gt; is a unit of work with statuses and artifacts. Transport is JSON-RPC 2.0 over HTTPS, with Server-Sent Events for streams.&lt;/p&gt;

&lt;p&gt;The analogy that gets used everywhere is HTTP. HTTP doesn't tell you what's in your backend (Rails, Django, Go) — it just defines the shape of requests and responses. A2A doesn't tell you what LLM, framework, or database you use — it defines the contract between agent A and agent B. A minimum on top of which you can build the rest.&lt;/p&gt;

&lt;p&gt;If you rewrote Carlini's scenario on A2A, instead of a lockfile in &lt;code&gt;current_tasks/&lt;/code&gt;, an agent would query a directory service for "who's working on the parser right now?", get the neighbor's Agent Card, and send a &lt;code&gt;Task&lt;/code&gt; with a streaming response over SSE. That's the communication primitive his harness doesn't yet have.&lt;/p&gt;

&lt;p&gt;I've been writing &lt;strong&gt;a2abridge&lt;/strong&gt; for the past several months — an open Go implementation of A2A 1.0 targeted at the practical scenario of "several different AI agents on one developer's machine." At the time of publication, six IDEs are supported: Claude Code, Codex CLI, Cursor, Cline, Continue, and Gemini CLI. Any A2A-compliant agent (including future Google ADK, LangGraph, CrewAI implementations) is a first-class peer with no glue code.&lt;/p&gt;

&lt;p&gt;Architecturally it's &lt;strong&gt;a single Go binary&lt;/strong&gt; (~10 MB) with several subcommands. &lt;code&gt;a2abridge directory&lt;/code&gt; is a discovery service on 127.0.0.1:7777 that runs as a user-level system service (launchd on macOS, systemd-user on Linux, Windows Service on Windows, works correctly inside WSL2). &lt;code&gt;a2abridge bridge&lt;/code&gt; is a per-agent process that hosts both an MCP stdio server (through which the IDE sees a2abridge as a regular MCP server with tools) and an A2A HTTP server on a random port, with an Agent Card at &lt;code&gt;/.well-known/a2a&lt;/code&gt; and the full set of JSON-RPC 2.0 methods from §7 of the spec: &lt;code&gt;SendMessage&lt;/code&gt;, &lt;code&gt;SendStreamingMessage&lt;/code&gt;, &lt;code&gt;GetTask&lt;/code&gt;, &lt;code&gt;ListTasks&lt;/code&gt;, &lt;code&gt;CancelTask&lt;/code&gt;, &lt;code&gt;SubscribeToTask&lt;/code&gt;, &lt;code&gt;GetExtendedAgentCard&lt;/code&gt;. The bridge's lifecycle equals the IDE session's lifetime — when MCP stdio closes, the bridge dies, no orphan processes.&lt;/p&gt;

&lt;p&gt;What Claude Code (or any other IDE) sees as MCP tools: &lt;code&gt;a2a_whoami&lt;/code&gt;, &lt;code&gt;a2a_list_agents&lt;/code&gt;, &lt;code&gt;a2a_send_message&lt;/code&gt;, &lt;code&gt;a2a_send_streaming&lt;/code&gt;, &lt;code&gt;a2a_get_task&lt;/code&gt;, &lt;code&gt;a2a_cancel_task&lt;/code&gt;, &lt;code&gt;a2a_inbox&lt;/code&gt;, &lt;code&gt;a2a_complete_task&lt;/code&gt;. Inside the session, the agent can &lt;strong&gt;independently&lt;/strong&gt; discover other agents on the machine, send them tasks, wait for replies, and read its inbox — without user involvement.&lt;/p&gt;

&lt;p&gt;On top of the protocol there's a pro-active layer that isn't in the spec but is needed for real use. The bridge writes an inbox file at &lt;code&gt;./.a2a/inbox-&amp;lt;ppid&amp;gt;.json&lt;/code&gt; every time the message queue changes. A UserPromptSubmit hook injects incoming messages into the system prompt &lt;strong&gt;before the first tool call&lt;/strong&gt; — meaning Claude sees "you have a message from a peer with FYI about a breaking API change" &lt;strong&gt;before&lt;/strong&gt; it starts taking blind action. The SSE fast-path delivers replies in milliseconds, with a 5-second polling fallback. For Claude Code there's also a &lt;strong&gt;skill&lt;/strong&gt; called &lt;code&gt;a2a-bridge&lt;/code&gt; that auto-loads only when triggered by relevant prompts — no globally loaded rules burning tokens on every session.&lt;/p&gt;

&lt;p&gt;In Carlini's scenario this would look like: agent 5 takes the task "fix kernel build error in &lt;code&gt;mm/page_alloc.c&lt;/code&gt;." Before acting, it calls &lt;code&gt;a2a_list_agents&lt;/code&gt;, sees that agent 2 has an open Task with capability &lt;code&gt;kernel-debug&lt;/code&gt; in the same area. It sends &lt;code&gt;a2a_send_message&lt;/code&gt;: "what are you working on, do you have a hypothesis?". It gets a streaming response: "tried alignment fix, failed on test_kernel_boot, currently looking at reorder header includes." It picks a different angle.&lt;/p&gt;

&lt;p&gt;Why an open protocol and not yet another custom wire format. Several solutions already exist in this niche: &lt;strong&gt;Anthropic Agent Teams&lt;/strong&gt; works only Claude↔Claude and is tied to a subscription. &lt;strong&gt;CCB&lt;/strong&gt; and &lt;strong&gt;claude-multi-agent-bridge&lt;/strong&gt; are closed formats locked to specific agent combinations. &lt;strong&gt;Ruflo&lt;/strong&gt; is excellent for enterprise federations of 100+ agents with central queens, but that's a different class of problem. The niche a2abridge targets is &lt;strong&gt;cross-vendor, open-protocol mesh&lt;/strong&gt;, where today Claude and Codex drop in, and tomorrow any A2A-compliant agent does, with no glue rewriting. If the industry is moving toward a standard, the bridge had better speak that standard.&lt;/p&gt;

&lt;p&gt;Production maturity: cross-machine federation with mTLS + ed25519 (opt-in, for the "home Mac ↔ office Linux" scenario), mDNS auto-discovery on the local network, a PII/secret screen running 11 regex detectors before sending (AWS keys, GitHub tokens, Anthropic/OpenAI/Google/Stripe/Slack tokens, JWTs, PEM blocks — replaced with &lt;code&gt;[REDACTED:&amp;lt;name&amp;gt;]&lt;/code&gt;, secret never leaves the bridge), Push Notifications per A2A 1.0 §9.5, HTTP+REST binding per §7.3, 35 test cases under &lt;code&gt;-race&lt;/code&gt;, a GitHub Actions release matrix, and a cross-platform &lt;code&gt;a2abridge doctor&lt;/code&gt; with a 9-check health audit. Install is a one-liner via &lt;code&gt;install.sh&lt;/code&gt; or &lt;code&gt;install.ps1&lt;/code&gt;, with auto-detection of every IDE on the machine and &lt;code&gt;.bak&lt;/code&gt; backups of their configs before edits.&lt;/p&gt;

&lt;p&gt;Repository: &lt;strong&gt;&lt;a href="https://github.com/vbcherepanov/a2abridge" rel="noopener noreferrer"&gt;github.com/vbcherepanov/a2abridge&lt;/a&gt;&lt;/strong&gt; — MIT, Go 1.25, current release v2.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  The memory axis: total-agent-memory and BrainCore
&lt;/h2&gt;

&lt;p&gt;Memory is in a different state. There's no A2A-level standard yet — everyone builds their own layer, and different approaches get picked for different tasks. What an agent writes to itself in a README is essentially a causal log in textual form: "tried A, failed at B, moved to C." The structure is right; the implementation is still plain text.&lt;/p&gt;

&lt;p&gt;I'm working on two products on this axis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/vbcherepanov/total-agent-memory" rel="noopener noreferrer"&gt;total-agent-memory&lt;/a&gt;&lt;/strong&gt; — open-source implementation. The core retrieval patterns, MCP integration, and the basic causal-chain model live here. Anyone can clone it, see how it works, and plug it into their Claude Code or Cursor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/vbcherepanov/braincore" rel="noopener noreferrer"&gt;BrainCore&lt;/a&gt;&lt;/strong&gt; — production-grade. A Go binary, local SQLite + WAL, tree-sitter for code-graph across 14 languages (PHP, TypeScript, Python, Ruby, Rust, Java, Kotlin, C/C++, C#, Swift, Bash, Lua, YAML, plus Go through its own native AST), internal git for time-travel memory, MCP protocol for connecting to Claude Code, Cursor, Codex CLI, Windsurf, and several other agents. Currently in beta.&lt;/p&gt;

&lt;p&gt;Architecturally there are three points where both projects diverge from a flat bag-of-facts with cosine search.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;causal decision chains&lt;/strong&gt; instead of flat facts. Not "function X is in file Y," but "agent 3 in task &lt;code&gt;fix kernel build&lt;/code&gt; formulated hypothesis &lt;code&gt;alignment issue&lt;/code&gt;, verified through test_kernel_boot, failed, moved to hypothesis &lt;code&gt;header reorder&lt;/code&gt;." Each step is typed, connected by a causal arrow, and queryable by every agent.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;AST-stable code identity&lt;/strong&gt;. When several agents refactor in parallel, text diffs quickly turn into mush, and merge conflicts become endless. An AST node remains a node even if a function moved from &lt;code&gt;parser.rs&lt;/code&gt; to &lt;code&gt;frontend/lexer.rs&lt;/code&gt; and got renamed from &lt;code&gt;parse_decl&lt;/code&gt; to &lt;code&gt;parse_declaration&lt;/code&gt;. In the graph, it's &lt;strong&gt;the same node&lt;/strong&gt; with a movement history. Every agent looks at the same abstraction, not at "lines 127-145 of file X."&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;persistence across container restart&lt;/strong&gt;. Memory lives &lt;strong&gt;outside&lt;/strong&gt; the Docker container: on the host through a volume, or remotely via MCP. The query &lt;code&gt;brain.causal_lookup(area="parser", lookback="6h")&lt;/code&gt; returns the same result regardless of which fresh container you're in.&lt;/p&gt;

&lt;p&gt;Rewriting Carlini's scenario with memory: agent 5 goes to BrainCore, gets the causal log "agent_2 tried alignment fix → failed, agent_7 tried header reorder → failed at L98, current hypothesis from agent_3 is alignment issue, in progress," picks a fourth hypothesis, writes it to the causal chain. Agents 2, 3, and 7 see this decision on their next pull. No READMEs, no greps.&lt;/p&gt;

&lt;h2&gt;
  
  
  How they fit together
&lt;/h2&gt;

&lt;p&gt;a2abridge and BrainCore are &lt;strong&gt;different layers, not competitors&lt;/strong&gt;. One answers "how do agents talk to each other," the other answers "what do they remember."&lt;/p&gt;

&lt;p&gt;The full picture for an agent team looks like this. &lt;strong&gt;BrainCore&lt;/strong&gt; holds the shared state of the world: code-graph, causal chains, hypotheses, conclusions. &lt;strong&gt;a2abridge&lt;/strong&gt; provides actual communication between agents: discovery, delegation, streaming responses, an inbox with context injection. When they work together, agent 5 sees a message in its inbox from agent 2 ("I'm working on X"), queries BrainCore for details ("what specifically has been tried in this area"), makes an informed decision, replies to agent 2 about its intention to take an adjacent task, and writes the result to shared memory.&lt;/p&gt;

&lt;p&gt;That's the architecture Carlini is building by hand in the experiment through the combination of lockfiles + READMEs + GCC oracle. With independent primitives instead of self-built glue, the infrastructure works in tasks where there's no ready-made known-good compiler.&lt;/p&gt;

&lt;h2&gt;
  
  
  What these primitives don't solve
&lt;/h2&gt;

&lt;p&gt;Carlini is absolutely right about the article's main lesson: &lt;strong&gt;a high-quality test harness is the foundation of everything&lt;/strong&gt;. No amount of shared memory and no A2A will save you if the task verifier is imprecise — agents will autonomously solve the wrong task. CI pipelines, well-designed logs, defenses against context window pollution, fighting "time blindness" — these work at any infrastructure level and &lt;strong&gt;remain the first priority&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The GCC oracle in the compiler task is genuinely the optimal choice. Binary verification is almost always better than comparing causal hypotheses. If you have a ready known-good in your project — use it. No memory replaces a good verifier.&lt;/p&gt;

&lt;p&gt;But in most real tasks — product development, refactoring, ML pipelines, business logic — there's no GCC equivalent. And there, the primitives of communication and memory become not an "improvement" but a &lt;strong&gt;necessary condition&lt;/strong&gt; for a team of 16 agents to be more productive than one.&lt;/p&gt;

&lt;p&gt;That Carlini had to build this entire text-and-file layer in 2026 isn't a flaw in his approach but a symptom of the moment: infrastructure for agent teams is still forming. The Anthropic experiment is the best possible illustration of how it's forming and where it's headed. And that, in my view, is the real value of Carlini's article: an honest report from the earliest point on the curve along which this infrastructure will grow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Open source and links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Original Anthropic article: &lt;a href="https://www.anthropic.com/engineering/building-c-compiler" rel="noopener noreferrer"&gt;Building a C compiler with a team of parallel Claudes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Compiler repo: &lt;a href="https://github.com/anthropics/claudes-c-compiler" rel="noopener noreferrer"&gt;anthropics/claudes-c-compiler&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A2A protocol specification: &lt;a href="https://a2a-protocol.org/latest/" rel="noopener noreferrer"&gt;a2a-protocol.org&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;a2abridge&lt;/strong&gt; — open A2A 1.0 mesh for 6 IDEs (Claude Code, Codex, Cursor, Cline, Continue, Gemini): &lt;a href="https://github.com/vbcherepanov/a2abridge" rel="noopener noreferrer"&gt;github.com/vbcherepanov/a2abridge&lt;/a&gt; (MIT, v2.0 shipped)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;total-agent-memory&lt;/strong&gt; — open-source memory layer: &lt;a href="https://github.com/vbcherepanov/total-agent-memory" rel="noopener noreferrer"&gt;github.com/vbcherepanov/total-agent-memory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BrainCore&lt;/strong&gt; — production memory infrastructure: &lt;a href="https://github.com/vbcherepanov/braincore" rel="noopener noreferrer"&gt;github.com/vbcherepanov/braincore&lt;/a&gt; (beta)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building your own agent teams and running into these problems — get in touch. Experience exchange is valuable in any case, and feedback on early product versions is the best thing that can happen to authors.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>claude</category>
      <category>llm</category>
      <category>rust</category>
    </item>
    <item>
      <title>The right of an AI agent to stay silent</title>
      <dc:creator>Vitalii Cherepanov</dc:creator>
      <pubDate>Sat, 09 May 2026 14:49:11 +0000</pubDate>
      <link>https://dev.to/vbcherepanov/the-right-of-an-ai-agent-to-stay-silent-5hi2</link>
      <guid>https://dev.to/vbcherepanov/the-right-of-an-ai-agent-to-stay-silent-5hi2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Part 3 of 3 — "Memory for AI agents"&lt;/strong&gt;&lt;br&gt;
Why the right metric isn't accuracy — it's zero confidently-wrong actions&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Article
&lt;/h2&gt;

&lt;p&gt;Picture two scenarios.&lt;/p&gt;

&lt;p&gt;In the first — a senior cardiac surgeon looks at a scan and says: &lt;em&gt;"I don't know. There are two competing hypotheses here, the symptoms overlap. We need additional tests — these three specifically, and a CT with contrast. Until I see those, I won't commit to an answer I'd defend."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the second — a bright-eyed intern confidently delivers a diagnosis in thirty seconds, leaning on a similar case from last week's textbook. Confident. Crisp. No doubt.&lt;/p&gt;

&lt;p&gt;Which one would you trust to operate on your mother?&lt;/p&gt;

&lt;p&gt;Right now, every AI agent we ship is the second doctor. Confident. Fast. Never says &lt;em&gt;"I don't know."&lt;/em&gt; And that's exactly why you can't trust them with anything more painful than rewriting a README.&lt;/p&gt;

&lt;p&gt;Today — how to change that. Not algorithmically. &lt;strong&gt;Architecturally.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  The rotten metric that poisoned us all
&lt;/h3&gt;

&lt;p&gt;There's an unspoken industry consensus that I think is a disaster: we measure models and systems by &lt;strong&gt;accuracy&lt;/strong&gt; — the percentage of correct answers on a benchmark.&lt;/p&gt;

&lt;p&gt;GPT-4 hits 86% on MMLU. Claude — 88%. Gemini — 90%. Better, better, even better. The number goes up.&lt;/p&gt;

&lt;p&gt;What that number &lt;strong&gt;doesn't&lt;/strong&gt; show: the remaining 10–14%. These aren't &lt;em&gt;"answers the model didn't give."&lt;/em&gt; They're &lt;strong&gt;confidently generated wrong answers&lt;/strong&gt;, visually indistinguishable from correct ones. The model has no warning light for &lt;em&gt;"I'm not sure here."&lt;/em&gt; It generates everything with the same textual confidence.&lt;/p&gt;

&lt;p&gt;When you use such a model to write notes — fine. When you use it for production code, medical decisions, legal opinions, financial transactions — &lt;strong&gt;10% confident hallucinations means 10% of cases where the system is lying to you with a straight face&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The right metric for production AI sounds different:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;0% confidently-wrong actions at an acceptable abstain rate.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not &lt;em&gt;"percentage of correct answers."&lt;/em&gt; But &lt;em&gt;"percentage of wrong actions"&lt;/em&gt; — zero. And separately — &lt;code&gt;abstain rate&lt;/code&gt;: how often the system honestly says &lt;em&gt;"I don't know, I need data / verification / clarification."&lt;/em&gt; Zero wrong actions plus 30% abstain is &lt;strong&gt;ten times&lt;/strong&gt; more production-ready than 90% accuracy with 10% confident hallucinations.&lt;/p&gt;

&lt;p&gt;Notice: I didn't say &lt;em&gt;"0% wrong answers."&lt;/em&gt; I said &lt;em&gt;"0% wrong **actions&lt;/em&gt;&lt;em&gt;."&lt;/em&gt; The distinction matters. An answer is words. An action is a commit, a transaction, a diagnosis, an API call, a change in production. Words can be reread and discarded. An action has already happened.&lt;/p&gt;

&lt;p&gt;And that separation between &lt;em&gt;"answer"&lt;/em&gt; and &lt;em&gt;"action"&lt;/em&gt; — that's what's architecturally absent from modern AI agents.&lt;/p&gt;




&lt;h3&gt;
  
  
  Abstain as a first-class outcome
&lt;/h3&gt;

&lt;p&gt;In Part 2 of this series I laid out seven principles of real memory, and the second was &lt;code&gt;strict mode&lt;/code&gt;. Quick recap: before a fact lands in prompt context, it passes through a &lt;strong&gt;gate&lt;/strong&gt; — source, confidence, temporal validity, no unresolved contradictions. If no fact made it through — the system returns &lt;code&gt;abstain = true&lt;/code&gt;, with an explicit reason.&lt;/p&gt;

&lt;p&gt;There's a detail I want to underline separately. &lt;strong&gt;Abstain is not an error.&lt;/strong&gt; It's a &lt;strong&gt;result&lt;/strong&gt;. Every bit as first-class as &lt;em&gt;"answer"&lt;/em&gt; or &lt;em&gt;"action."&lt;/em&gt; If your AI has exactly two possible outcomes — &lt;em&gt;"answered"&lt;/em&gt; and &lt;em&gt;"got it wrong"&lt;/em&gt; — it has no architectural place for an honest &lt;em&gt;"I don't know."&lt;/em&gt; Which means it's going to make things up.&lt;/p&gt;

&lt;p&gt;In a sane system, there are &lt;strong&gt;at least four outcomes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;answer&lt;/strong&gt; — sufficient evidence, answer given, action executed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;clarification request&lt;/strong&gt; — partial evidence, needs user input&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;abstain → brain task&lt;/strong&gt; — insufficient evidence, recorded as a backlog task with an explicit data request&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;escalation&lt;/strong&gt; — there's a contradiction that requires human review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the last three aren't fallbacks. Not &lt;em&gt;"when everything went wrong."&lt;/em&gt; They're full, expected, designed-in paths.&lt;/p&gt;

&lt;p&gt;When I ask &lt;code&gt;braincore&lt;/code&gt; to find a decision about auth flow on a project we've been working on for three months — it finds it. When I ask about a project I just started, where nothing's recorded yet — it doesn't make things up. It says: &lt;em&gt;"I have no evidence on this question. Created a brain task: collect decisions on auth, source — our current design doc, owner — you. Once you fill it in, ask again."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;not a bug&lt;/strong&gt;. It's the right behavior. Notice what happened: the system didn't block me. Didn't say &lt;em&gt;"error, no data."&lt;/em&gt; It &lt;strong&gt;turned the not-knowing into a task&lt;/strong&gt;, which now lives in its backlog and will periodically remind itself.&lt;/p&gt;




&lt;h3&gt;
  
  
  Self-Tasking. A brain with a backlog, not a passive search engine
&lt;/h3&gt;

&lt;p&gt;The thing that scares me most about modern &lt;em&gt;"AI agents"&lt;/em&gt; is that they're &lt;strong&gt;passive&lt;/strong&gt;. They wait for a prompt. Every. Single. Time. Remember nothing between sessions. Have no &lt;strong&gt;internal backlog&lt;/strong&gt;. Don't realize they have unresolved questions.&lt;/p&gt;

&lt;p&gt;That's not an &lt;em&gt;"agent."&lt;/em&gt; That's &lt;strong&gt;a function in agent costume&lt;/strong&gt;. A function takes input, returns output. An agent has goals, state, and its own tasks between requests.&lt;/p&gt;

&lt;p&gt;In a real cognitive runtime, there's a separate entity — &lt;strong&gt;brain tasks&lt;/strong&gt;. They get spawned automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;truth.contradiction&lt;/code&gt; — a contradiction found in the knowledge graph → task to resolve&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;truth.staleness&lt;/code&gt; — a fact hasn't been confirmed in a long time → task to verify&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;strict.abstain&lt;/code&gt; — the system refused to answer → task to find evidence&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;selflearn.skill_scorecard&lt;/code&gt; — a skill started failing often → task to repair&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;specs.evidence_gap&lt;/code&gt; — a requirement without coverage proof → task to gather&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tests.failing_coverage&lt;/code&gt; — tests aren't passing → task to fix&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;learning.failure_pattern&lt;/code&gt; — a recurring error pattern detected → task to generalize into a rule&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each task &lt;strong&gt;prioritizes itself&lt;/strong&gt; by a simple formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;priority = f(urgency, impact, confidence, risk, effort, dependency_readiness)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And at any moment, the user can ask: &lt;em&gt;"show the next five tasks, why they matter, which I can safely do now, which need my input."&lt;/em&gt; That's not the same chat where you start with a blank slate every time. It's a working environment with its own memory of what's not done.&lt;/p&gt;

&lt;p&gt;This is a flip in framing. Not &lt;em&gt;"user shows up and asks, agent answers."&lt;/em&gt; But &lt;em&gt;"agent runs in the background, accumulates open threads, and tells you — here's what matters now."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Show me a RAG stack that does this. Spoiler: there isn't one. Because &lt;strong&gt;RAG is a search engine, not an agent&lt;/strong&gt;. And when someone says &lt;em&gt;"our RAG-based AI has agency"&lt;/em&gt; — that's marketing fiction. Agency requires &lt;strong&gt;internal state&lt;/strong&gt;, &lt;strong&gt;goals&lt;/strong&gt;, &lt;strong&gt;a backlog&lt;/strong&gt;, and &lt;strong&gt;self-assessment&lt;/strong&gt;. RAG has none of these.&lt;/p&gt;




&lt;h3&gt;
  
  
  Cognitive Runtime &amp;gt; Model Size
&lt;/h3&gt;

&lt;p&gt;The last myth to dismantle.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"When GPT-5 / Claude 5 / Gemini 3 ships — memory will solve itself."&lt;/em&gt; No. It won't. Ever.&lt;/p&gt;

&lt;p&gt;Memory is &lt;strong&gt;not a property of the model&lt;/strong&gt;. It's a &lt;strong&gt;property of the system&lt;/strong&gt; the model runs in. The analogy:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A human has good memory not because neurons compute fast.&lt;br&gt;
A human has good memory because there's a hippocampus, a neocortex, sleep-time consolidation, emotional gating through the amygdala, and an architectural separation between working / episodic / semantic / procedural memory.&lt;br&gt;
&lt;strong&gt;It's infrastructure, not compute power.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Make the LLM ten times bigger — memory still doesn't appear. Build a runtime around the existing LLM that implements the seven principles from Part 2 plus abstain plus self-tasking — and a &lt;strong&gt;weak local model&lt;/strong&gt; in that runtime starts doing things GPT-5 with RAG-memory &lt;strong&gt;architecturally cannot&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not because it's smarter. But because &lt;strong&gt;the runtime does for it what it shouldn't have to do itself&lt;/strong&gt;: remembers, verifies, abstains, tasks itself.&lt;/p&gt;

&lt;p&gt;This is, by the way, the only meaningful path forward in a world where foundation models are &lt;strong&gt;commodity&lt;/strong&gt;. When everyone has roughly equivalent Claude/GPT/Gemini — competitive advantage can only come from &lt;strong&gt;what's around the model&lt;/strong&gt;. Domain-specific cognitive runtime. Project-specific memory. Team-specific rules.&lt;/p&gt;

&lt;p&gt;And this bet is also about privacy. About data sovereignty. About the fact that &lt;strong&gt;your project's memory is your capital&lt;/strong&gt;, and handing it to a third-party vector DB to pay monthly rent on it is a strategic mistake you'll only notice three years in, when you can't leave anymore.&lt;/p&gt;

&lt;p&gt;That's why, incidentally, &lt;code&gt;braincore&lt;/code&gt; is a &lt;strong&gt;local&lt;/strong&gt; Go binary that works by default &lt;strong&gt;without&lt;/strong&gt; OpenAI and without Anthropic. Not because I'm against them (I'm a paying customer of both). But because &lt;strong&gt;the architecturally correct path&lt;/strong&gt; is a runtime where the model is a swappable component, not the center of gravity.&lt;/p&gt;




&lt;h3&gt;
  
  
  A checklist for anyone building AI products right now
&lt;/h3&gt;

&lt;p&gt;If you've read the whole series and you're thinking &lt;em&gt;"okay, agreed, what do I do Monday morning?"&lt;/em&gt; — here are ten items you can start moving on &lt;strong&gt;regardless&lt;/strong&gt; of whether you use &lt;code&gt;braincore&lt;/code&gt; or not.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Drop the word "memory" from your stack if what you have is RAG.&lt;/strong&gt; Call it retrieval or search — instantly removes 80% of inflated expectations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Introduce &lt;code&gt;truth_status&lt;/code&gt; for every fact.&lt;/strong&gt; Minimum: &lt;code&gt;hypothesis | confirmed | deprecated&lt;/code&gt;. Disallow &lt;code&gt;confirmed&lt;/code&gt; without &lt;code&gt;source_ref&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Introduce &lt;code&gt;valid_from&lt;/code&gt; / &lt;code&gt;valid_until&lt;/code&gt;.&lt;/strong&gt; Any fact without temporal validity is a hypothesis, not a fact.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Make abstain a first-class outcome.&lt;/strong&gt; Not &lt;em&gt;"when things go wrong"&lt;/em&gt; — but as one of four valid results.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Distinguish &lt;code&gt;staging | working | consolidated | archived&lt;/code&gt;.&lt;/strong&gt; Don't dump everything into one collection.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Negative memory.&lt;/strong&gt; What broke — record it explicitly, with a link to the failing test or commit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Entity disambiguation.&lt;/strong&gt; Never auto-merge entities at low confidence. Create an &lt;code&gt;ambiguity record&lt;/code&gt; instead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Causal chains for decisions.&lt;/strong&gt; Not "text" — &lt;code&gt;problem → alternatives → decision → reasoning → outcome&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Local where possible.&lt;/strong&gt; Project memory is &lt;strong&gt;your&lt;/strong&gt; capital.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The metric is not "&lt;em&gt;percentage of correct answers&lt;/em&gt;." It's &lt;code&gt;0% wrong actions at an acceptable abstain rate&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Not all at once. Pick two or three and start. In a month, you'll have an AI system you can trust more than most that exist.&lt;/p&gt;




&lt;h3&gt;
  
  
  Epilogue. Cognitive hygiene for the AI industry
&lt;/h3&gt;

&lt;p&gt;I'm tired of the word &lt;em&gt;"memory"&lt;/em&gt; getting slapped on every vector database with embeddings. It's a devaluation of the term — like calling a one-column &lt;code&gt;text VARCHAR&lt;/code&gt; table a knowledge base. Technically — yes. Substantively — no.&lt;/p&gt;

&lt;p&gt;Memory is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;structure&lt;/strong&gt;, not a flat list&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;knowing the boundary&lt;/strong&gt;, not confident bullshit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;causal chains&lt;/strong&gt;, not chunks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;entity-aware&lt;/strong&gt;, not string-aware&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;temporal-aware&lt;/strong&gt;, not &lt;em&gt;"created yesterday, valid forever"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;self-correcting&lt;/strong&gt;, not self-deceiving&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;governed&lt;/strong&gt;, not &lt;em&gt;"dump whatever, sort later"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;abstain-capable&lt;/strong&gt;, not &lt;em&gt;"always answers"&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your &lt;em&gt;"AI with memory"&lt;/em&gt; doesn't do at least half of those — your AI doesn't have memory. It has search results. These aren't the same thing.&lt;/p&gt;

&lt;p&gt;One last thing. I'm not telling you to throw out RAG. RAG is an excellent tool for its class of tasks (find me the paragraph about X in 100 documents). I'm telling you to &lt;strong&gt;stop calling RAG memory&lt;/strong&gt; and start building real cognitive runtimes — slower, more disciplined, with explicit gates and explicit abstain. It's the only path to AI systems you can trust with anything more important than rewriting a README.&lt;/p&gt;

&lt;p&gt;If you're a startup with &lt;em&gt;"our AI has long-term memory on a vector database"&lt;/em&gt; in your pitch deck — close that slide, redo it, and in two years you'll thank yourself.&lt;/p&gt;

&lt;p&gt;If you're a developer fighting with an agent that forgets what you said yesterday — that's not the agent's fault. It's the fault of whoever sold you a search engine wrapped as a brain.&lt;/p&gt;

&lt;p&gt;A good AI agent &lt;strong&gt;isn't the one that always answers&lt;/strong&gt;. A good AI agent &lt;strong&gt;is the one that never takes a confidently wrong action&lt;/strong&gt;. Between those two sentences lies the entire chasm separating 2024's AI tooling from AI tooling that will be trustworthy in 2027.&lt;/p&gt;

&lt;p&gt;I've picked my side of the chasm. Building &lt;code&gt;braincore&lt;/code&gt; — open, Apache-2.0, in the repo. If you recognize yourself in this series — we're in the same boat. If something works differently in your stack — tell me how, I genuinely want to know.&lt;/p&gt;

&lt;p&gt;The one thing you can't do is stay silent.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR of the whole series:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 1:&lt;/strong&gt; RAG = Ctrl+F with embeddings. It's search, not memory. Mem0/Letta/Zep — RAG in wrappers. 1M context is RAM, not disk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 2:&lt;/strong&gt; Real memory = seven principles in combination. Atomic units + lifecycle + truth_status + temporal + causal chains + AST identity + internal git + memory scoring + negative memory. Each exists in isolation. Combined — different product.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3:&lt;/strong&gt; The metric for production AI isn't accuracy — it's &lt;em&gt;0% confidently-wrong actions&lt;/em&gt;. Abstain is a first-class outcome, not an error. Cognitive runtime &amp;gt; model size.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your AI "remembers" via &lt;code&gt;vector_db.query(top_k=5)&lt;/code&gt; — it has dementia disguised as confidence. Fix the architecture, not the model.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Part 3 of 3. Series complete. If this resonated — share it. If you disagree — tell me in the comments, I love substantive arguments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>software</category>
    </item>
    <item>
      <title>Seven principles of real memory for AI agents</title>
      <dc:creator>Vitalii Cherepanov</dc:creator>
      <pubDate>Wed, 06 May 2026 07:15:40 +0000</pubDate>
      <link>https://dev.to/vbcherepanov/seven-principles-of-real-memory-for-ai-agents-1k8k</link>
      <guid>https://dev.to/vbcherepanov/seven-principles-of-real-memory-for-ai-agents-1k8k</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx53w6jm9tj0a4hkiz5zf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx53w6jm9tj0a4hkiz5zf.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Part 2 of 3 — "Memory for AI agents"&lt;/strong&gt;&lt;br&gt;
Architecture. Concrete. With formulas and lifecycle.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Article
&lt;/h2&gt;

&lt;p&gt;In the previous post I broke the &lt;em&gt;"RAG = memory"&lt;/em&gt; pitch into three uncomfortable problems: a chunk doesn't know it's a chunk; retrieval has no structure, only cosine; time doesn't exist as a first-class concept. In short — RAG is search wearing the marketing word &lt;em&gt;"memory."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Today — what should be there &lt;strong&gt;instead&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A disclaimer up front. I don't claim to have invented any single item on this list. Atomic facts go back to Wittgenstein. Temporal validity is basic logic. Knowledge graphs are a whole field with textbooks. Lifecycle for data is standard in any normal information system.&lt;/p&gt;

&lt;p&gt;I claim something different. I claim that &lt;strong&gt;all seven properties have to work in one system at the same time&lt;/strong&gt;, and that any system in which only five of seven actually work continues to lie to the user with a confident face. There's only one way to see this — try assembling all seven into one codebase and watch what happens.&lt;/p&gt;

&lt;p&gt;I tried. It worked. Called it &lt;code&gt;braincore&lt;/code&gt;. Open source, Apache-2.0, single Go binary, MCP-stdio. I won't turn the article into a pitch — but in each section below I'll add one line about how it's done in &lt;code&gt;braincore&lt;/code&gt;, so it's clear we're not talking theory.&lt;/p&gt;

&lt;p&gt;Let's go.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 1. Atomic Knowledge Units with lifecycle, not "chunks in Qdrant"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pain.&lt;/strong&gt; In RAG, any incoming text — dialogue, design doc, git commit, meeting transcript — gets sliced into chunks and shipped into the vector DB without questions. From there, no matter what happens — all chunks are equivalent, all equally "fresh," all equally "true." Six months later, one collection holds a soup of stale, current, hypothetical, and refuted facts. And every one of them has exactly one chance of making it into retrieval — by cosine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What should be in the schema.&lt;/strong&gt; Any incoming information &lt;strong&gt;does not flow into memory directly&lt;/strong&gt;. It runs through a pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;input
  → initial trust (by source: user=0.9, llm=0.3, web=0.4..0.7)
  → parse (entity / fact / relation / event / rule / hypothesis)
  → atomic knowledge units
  → validate (source / graph / dedup / contradiction / temporal / rule)
  → link (at least 1 edge into graph OR review item)
  → working memory (TTL + activation)
  → iterative verification loop
  → consolidation
  → long-term memory (only confirmed + linked)
  → edge strengthening (usage + success + co-occurrence − decay)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The core rule: &lt;strong&gt;nothing enters long-term memory immediately&lt;/strong&gt;. Every atomic knowledge unit has at minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;truth_status&lt;/code&gt;: &lt;code&gt;hypothesis | candidate | confirmed | contradicted | deprecated&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lifecycle&lt;/code&gt;: &lt;code&gt;staging | working | consolidated | archived&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;source_ref&lt;/code&gt; — where it came from&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;confidence&lt;/code&gt; — numerical certainty estimate&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;valid_from&lt;/code&gt; / &lt;code&gt;valid_until&lt;/code&gt; — when it's true&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compare that to a RAG chunk that has only &lt;code&gt;text&lt;/code&gt; and &lt;code&gt;embedding&lt;/code&gt;. It's the difference between a junk drawer and a warehouse with inventory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this enables.&lt;/strong&gt; When yesterday you said &lt;em&gt;"we use Postgres"&lt;/em&gt; and today &lt;em&gt;"we migrated to ClickHouse, Postgres is OLTP only"&lt;/em&gt; — the old fact automatically gets &lt;code&gt;valid_until = today&lt;/code&gt; and &lt;code&gt;superseded_by = new_fact_id&lt;/code&gt;. On retrieve, it either doesn't appear at all, or it comes flagged &lt;em&gt;"historical, not current."&lt;/em&gt; Not because of a smart model. Because of the &lt;strong&gt;schema&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How braincore does it.&lt;/strong&gt; The pipeline &lt;code&gt;staging → working → consolidated&lt;/code&gt; is implemented literally — three separate SQLite tables plus an intermediate verification loop. A record reaches &lt;code&gt;consolidated&lt;/code&gt; only if &lt;code&gt;truth_status = confirmed&lt;/code&gt;, has at least one graph edge, no unresolved contradictions, and &lt;code&gt;confidence ≥ threshold&lt;/code&gt;. Otherwise it stays in &lt;code&gt;working&lt;/code&gt; with a TTL, or moves to a &lt;code&gt;review queue&lt;/code&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 2. Strict Mode and the right to abstain
&lt;/h3&gt;

&lt;p&gt;This is, possibly, the most important point in the entire series. And the most absent from commercial memory frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pain.&lt;/strong&gt; The standard metric AI systems are measured by — &lt;em&gt;"how often they give the right answer."&lt;/em&gt; This is a &lt;strong&gt;rotten&lt;/strong&gt; metric. 95% correct answers and 5% confident hallucinations is a system &lt;strong&gt;you cannot trust in production&lt;/strong&gt;. Because you don't know in advance which 5% you're in right now.&lt;/p&gt;

&lt;p&gt;The right metric reads:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;0% confidently-wrong actions at an acceptable abstain rate.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not &lt;em&gt;"always answer."&lt;/em&gt; But &lt;em&gt;"never take a wrong action without verification."&lt;/em&gt; And if verification is missing — &lt;strong&gt;say "I don't know"&lt;/strong&gt; and assign yourself a task to fix it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What should be in the schema.&lt;/strong&gt; Before a fact lands in prompt context, it passes through a &lt;strong&gt;gate&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;is there a &lt;code&gt;source_ref&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;confidence ≥ threshold&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;trust_score ≥ threshold&lt;/code&gt; (for the source)?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;temporal_valid == true&lt;/code&gt; (valid at query time)?&lt;/li&gt;
&lt;li&gt;no unresolved &lt;code&gt;contradiction&lt;/code&gt; in the graph?&lt;/li&gt;
&lt;li&gt;no unresolved &lt;code&gt;ambiguity&lt;/code&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If even one requirement fails — the fact &lt;strong&gt;does not reach&lt;/strong&gt; context. If no fact made it through for a query — the system returns &lt;code&gt;abstain = true&lt;/code&gt; with &lt;code&gt;reason = no_accepted_facts&lt;/code&gt; (or &lt;code&gt;contradiction_unresolved&lt;/code&gt;, or &lt;code&gt;temporal_invalid&lt;/code&gt; — always explicit).&lt;/p&gt;

&lt;p&gt;And — pay attention, here's where the magic happens — &lt;strong&gt;abstain is not delivered to the user as a dead end&lt;/strong&gt;. It becomes a &lt;strong&gt;brain task&lt;/strong&gt; in the backlog: &lt;em&gt;"I need evidence for X to answer with confidence. The source is here, the specific conflict is here."&lt;/em&gt; The system knows what it doesn't know, and assigns itself the task to fix it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this enables.&lt;/strong&gt; An AI agent you can trust. Not because it's always right — but because when it's not sure, it &lt;strong&gt;stays silent&lt;/strong&gt; or &lt;strong&gt;asks for clarification&lt;/strong&gt;. And when it does take action — the action is grounded in facts that &lt;strong&gt;passed the gate&lt;/strong&gt;, not "well, ChatGPT thought this was better."&lt;/p&gt;

&lt;p&gt;Show me one RAG stack that does this. I'll wait.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How braincore does it.&lt;/strong&gt; The &lt;code&gt;internal/strictmode&lt;/code&gt; package is a separate module with explicit gate rules. By default, every query passes through strict mode; for UX scenarios where abstain is unacceptable (brainstorming, for example), you can drop it via an explicit &lt;code&gt;--allow-uncertainty&lt;/code&gt; flag. All abstain events are logged as brain tasks with their source and reason.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 3. Causal Decision Chains, not flat facts
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pain.&lt;/strong&gt; In RAG, any decision is stored as &lt;em&gt;"text about a decision."&lt;/em&gt; On retrieve, you get a chunk of text that &lt;strong&gt;describes&lt;/strong&gt; the decision — but doesn't answer &lt;em&gt;"why?"&lt;/em&gt;, &lt;em&gt;"what alternatives did we consider?"&lt;/em&gt;, &lt;em&gt;"what came of it?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Six months later, you ask &lt;em&gt;"why did we pick JWT over sessions?"&lt;/em&gt; — RAG returns three fragments of the declaration, and the model fills in the reasoning itself. Sometimes correctly. Sometimes inventing it from popular patterns in its training data. You don't know which one this time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What should be in the schema.&lt;/strong&gt; The entity is not a &lt;em&gt;"document"&lt;/em&gt; and not a &lt;em&gt;"memory entry."&lt;/em&gt; The entity is called &lt;strong&gt;decision&lt;/strong&gt; and has a schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;problem      → what we were solving
alternatives → what we considered and rejected (with reasons)
decision     → what we chose
reasoning    → why this specifically
outcome      → what came of it (filled in later, post-hoc)
superseded_by → link to a new decision if this one was revised
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't &lt;em&gt;"let's stuff text into an embedding."&lt;/em&gt; This is a &lt;strong&gt;causal chain&lt;/strong&gt; that answers &lt;strong&gt;WHY&lt;/strong&gt;, not just &lt;strong&gt;WHAT&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this enables.&lt;/strong&gt; Six months later, you ask &lt;em&gt;"why JWT?"&lt;/em&gt; — the system returns a structured answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Problem:&lt;/strong&gt; session scaling + audit requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alternatives (rejected):&lt;/strong&gt; stateful sessions with Redis (violates audit), opaque tokens with centralized lookup (latency).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision:&lt;/strong&gt; JWT with short TTL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning:&lt;/strong&gt; stateless, audit-neutral, latency acceptable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outcome (recorded 4 months later):&lt;/strong&gt; invalidation complexity higher than expected; added refresh tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Superseded by:&lt;/strong&gt; none.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG returns three fragments. A decision chain returns &lt;strong&gt;reasoning&lt;/strong&gt;. These are different products.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How braincore does it.&lt;/strong&gt; Decisions are a separate entity type in the graph with required &lt;code&gt;problem&lt;/code&gt;, &lt;code&gt;alternatives[]&lt;/code&gt;, &lt;code&gt;decision&lt;/code&gt;, &lt;code&gt;reasoning&lt;/code&gt; fields, and optional &lt;code&gt;outcome&lt;/code&gt;/&lt;code&gt;superseded_by&lt;/code&gt;. They're stored not as chunks but as structured records with explicit edges into the code graph and into other decisions.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 4. Stable code identity through AST, not strings
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pain.&lt;/strong&gt; This one is specific to AI agents working with code — but it hits all of them. You renamed &lt;code&gt;GetUser → FetchUser&lt;/code&gt;, moved it from &lt;code&gt;pkg/auth&lt;/code&gt; to &lt;code&gt;pkg/user&lt;/code&gt;, changed the signature from a pointer receiver to a value receiver. All the references in RAG memory pointing to &lt;em&gt;"GetUser in pkg/auth"&lt;/em&gt; are now &lt;strong&gt;dead&lt;/strong&gt;. Because RAG is bound to &lt;strong&gt;strings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And nobody tells you. The chunk keeps living in Qdrant, its cosine to auth-related queries stays high. The agent pulls dead information and works against it. Congratulations, you have memory rot disguised as memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What should be in the schema.&lt;/strong&gt; Code parsing through &lt;code&gt;go/ast&lt;/code&gt; (for Go) and tree-sitter (for PHP, JS, TS, Python, Rust, Java, and beyond). &lt;strong&gt;Node identity&lt;/strong&gt; is built not from a string and not from a file path, but from a structural hash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;node_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qualified_name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;signature_hash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Renaming a function &lt;strong&gt;does not break&lt;/strong&gt; references to it (&lt;code&gt;qualified_name&lt;/code&gt; changed, but the link is updated automatically on next parse, with a back-reference to the old &lt;code&gt;node_id&lt;/code&gt; as &lt;code&gt;renamed_from&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Moving between packages — same thing.&lt;/li&gt;
&lt;li&gt;Changing the signature (pointer → value receiver) — &lt;code&gt;signature_hash&lt;/code&gt; changes, and old references &lt;strong&gt;automatically get marked &lt;code&gt;stale&lt;/code&gt;&lt;/strong&gt; — the brain &lt;strong&gt;knows&lt;/strong&gt; they now require review.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What this enables.&lt;/strong&gt; When the AI agent is about to edit &lt;code&gt;FetchUser&lt;/code&gt;, the system pulls three past decisions about that function, two regressions in this module, and active project rules — &lt;strong&gt;before&lt;/strong&gt; the agent starts writing code. Not because cosine happened to align. Because it's a &lt;strong&gt;code graph&lt;/strong&gt;, and &lt;code&gt;FetchUser&lt;/code&gt; has edges to decisions, regressions, and rules &lt;strong&gt;by identity&lt;/strong&gt;, not by text similarity.&lt;/p&gt;

&lt;p&gt;I call this pre-edit warning. And it's a qualitatively different kind of error prevention than &lt;em&gt;"let's run a linter after generation."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How braincore does it.&lt;/strong&gt; The code graph is a separate layer over AST/tree-sitter, with background reindex on filesystem watch events. Identity hashes live in SQLite, edges live there too. On a pre-edit hook, the agent gets the context of related decisions/rules/regressions automatically.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 5. Internal Git as memory versioning
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pain.&lt;/strong&gt; RAG has no concept of time beyond &lt;code&gt;created_at&lt;/code&gt;. That's metadata about a &lt;strong&gt;record&lt;/strong&gt;, not about a &lt;strong&gt;state of knowledge&lt;/strong&gt;. You can't ask &lt;em&gt;"show me what I knew about this code a month ago."&lt;/em&gt; You can't roll back the state of memory to before the agent dragged in garbage. You can't switch to a feature branch and have a parallel mental state for it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What should be in the schema.&lt;/strong&gt; Every change in memory is a &lt;strong&gt;commit&lt;/strong&gt;. Not metaphorically. Literally, through &lt;code&gt;go-git&lt;/code&gt;, into a hidden &lt;code&gt;.internal-git/&lt;/code&gt; repository that lives parallel to the project's main repo.&lt;/p&gt;

&lt;p&gt;This gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;git log&lt;/code&gt; over the project's &lt;strong&gt;memory&lt;/strong&gt; — what was added, what changed, when.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;git checkout&lt;/code&gt; to roll back the brain state by N days — for audit, for regression investigation, for tests.&lt;/li&gt;
&lt;li&gt;When you switch to a feature branch in the main repo, the brain &lt;strong&gt;mirrors&lt;/strong&gt; that, and each branch has its own mental state. An experiment in a feature branch doesn't pollute master's memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What this enables.&lt;/strong&gt; Time-travel queries: &lt;em&gt;"which decision did I consider current 30 days ago?"&lt;/em&gt; Audit: &lt;em&gt;"when exactly did the agent start believing we use ClickHouse?"&lt;/em&gt; Branch isolation: &lt;em&gt;"in feature/oauth we have a different approach to auth, but that knowledge shouldn't leak into main."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;RAG can't do this. RAG has no concept of &lt;em&gt;"state of knowledge"&lt;/em&gt; — only a set of vectors that grows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How braincore does it.&lt;/strong&gt; The &lt;code&gt;.internal-git/&lt;/code&gt; is created on &lt;code&gt;braincore init&lt;/code&gt;. Commits are made automatically on every change to knowledge units and graph edges. Branch tracking is synchronized with the main git through a post-checkout hook.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 6. Memory Scoring — because not all knowledge is equal
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pain.&lt;/strong&gt; In RAG, all chunks are equal. Top-k by cosine doesn't distinguish &lt;em&gt;"this is confirmed by ten past uses"&lt;/em&gt; from &lt;em&gt;"this was written yesterday and never used again."&lt;/em&gt; It doesn't distinguish &lt;em&gt;"this is critical for the architecture"&lt;/em&gt; from &lt;em&gt;"this is a random note in a corner."&lt;/em&gt; It doesn't distinguish &lt;em&gt;"this is in active use"&lt;/em&gt; from &lt;em&gt;"this has been gathering dust since last year."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What should be in the schema.&lt;/strong&gt; Every knowledge unit has a composite &lt;code&gt;MemoryScore&lt;/code&gt;, computed as a weighted sum:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MemoryScore =
  + 0.22 * ImportanceScore     (explicit importance, or derived from connectivity)
  + 0.22 * TrustScore          (source reliability + history of confirmations)
  + 0.20 * TaskRelevanceScore  (relevance to current work context)
  + 0.12 * UsageScore          (how often it's used)
  + 0.10 * RecencyScore        (freshness)
  + 0.10 * StabilityScore      (how often it changes — stable is more reliable)
  + 0.08 * NoveltyScore        (novelty as a soft boost)
  − 0.18 * RiskScore           (potential harm from use)
  − 0.18 * NoiseScore          (noise, duplicates, low coherence)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And on retrieve, what runs is &lt;strong&gt;no longer cosine similarity&lt;/strong&gt;, but:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RetrievalScore =
  + 0.35 * semantic_similarity
  + 0.20 * memory_score
  + 0.15 * graph_relevance
  + 0.15 * temporal_validity
  + 0.10 * trust_score
  − 0.15 * ambiguity_penalty
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These weights aren't ultimate truth — they're empirically tuned and shift with usage profile. The point isn't the numbers, it's the &lt;strong&gt;architectural shift&lt;/strong&gt;: retrieval stops being &lt;em&gt;"text similarity"&lt;/em&gt; and becomes &lt;em&gt;"similarity × importance × trust × freshness."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Lifecycle transitions automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;memory_score ≥ 0.80&lt;/code&gt; and &lt;code&gt;trust ≥ 0.75&lt;/code&gt; → &lt;code&gt;consolidated&lt;/code&gt; (knowledge becomes "firmware")&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_score ≥ 0.55&lt;/code&gt; → stays in &lt;code&gt;working&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_score ≥ 0.30&lt;/code&gt; → &lt;code&gt;staging&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_score &amp;lt; 0.30&lt;/code&gt; → &lt;code&gt;archive candidate&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What this enables.&lt;/strong&gt; Active memory. Not storage. &lt;strong&gt;An active environment&lt;/strong&gt; in which what's important strengthens through use, and noise &lt;strong&gt;decays on its own&lt;/strong&gt; — like in a biological brain, where rarely-used synapses weaken and frequently-used ones strengthen.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG = a hard drive that never gets defragmented.&lt;br&gt;
Brain = a brain in which junk &lt;strong&gt;settles&lt;/strong&gt; on its own and gets archived automatically.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;How braincore does it.&lt;/strong&gt; Scoring is recomputed by a background job every N hours. Lifecycle transitions are atomic and logged (see Principle 5). All weights are exposed in config — tune them per project.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 7. Negative Memory and Rule Engine
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The pain.&lt;/strong&gt; Here's what every LLM agent does today: &lt;strong&gt;repeats mistakes&lt;/strong&gt;. Yesterday it broke a migration — today it'll break a similar one. RAG won't help, because &lt;strong&gt;the broken migration doesn't go into RAG&lt;/strong&gt;. What goes into RAG is &lt;em&gt;"how to write migrations"&lt;/em&gt; from the official docs. The fact that you personally already stepped on this rake — recorded nowhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What should be in the schema.&lt;/strong&gt; A separate class — &lt;strong&gt;negative memory&lt;/strong&gt;: what broke, why it broke, how it was fixed, which commit/test confirms it. First-class entity, not a marginal field.&lt;/p&gt;

&lt;p&gt;And during planning, every patch passes through a &lt;strong&gt;Rule Engine&lt;/strong&gt; before code is generated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;patch
  → architectural rules
  → code rules
  → security rules
  → performance rules
  → anti-patterns (including "this exact one I broke before")
  → repair plan OR abstain
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a rule with severity &lt;code&gt;critical&lt;/code&gt; or &lt;code&gt;high&lt;/code&gt; is violated — &lt;strong&gt;the code does not get written&lt;/strong&gt;. A repair plan is created. If repair is impossible — &lt;code&gt;abstain&lt;/code&gt; (see Principle 2). No "let's hope this passes" generation.&lt;/p&gt;

&lt;p&gt;And, critically, the &lt;strong&gt;safe execution pipeline&lt;/strong&gt; closes the loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;checkpoint
  → apply patch
  → rules validate
  → build
  → tests
  → success → commit
  → fail → rollback → record into negative memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every &lt;strong&gt;executed&lt;/strong&gt; action is either confirmed by tests, rolled back, or recorded as &lt;strong&gt;negative evidence&lt;/strong&gt; for future decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this enables.&lt;/strong&gt; An agent that &lt;strong&gt;cannot&lt;/strong&gt; repeat your last year's mistake. Not because it has a great model — but because &lt;strong&gt;the rule engine physically refuses to let through&lt;/strong&gt; any patch that violates a rule derived from that mistake.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG helps the agent find something. Good memory &lt;strong&gt;prevents&lt;/strong&gt; the agent from breaking something.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These are different products. And I feel sorry for those who keep mixing them up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How braincore does it.&lt;/strong&gt; Negative memory is a separate entity type with a required link to a failing test or git commit. The rule engine is a pre-execution gate, severity-aware, with override possible only via explicit user confirmation.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bonus principle. Entity Disambiguation
&lt;/h3&gt;

&lt;p&gt;Formally a special case of Principle 1 (atomic units), but it breaks separately often enough to deserve its own callout.&lt;/p&gt;

&lt;p&gt;In RAG, there's no concept of an &lt;strong&gt;entity&lt;/strong&gt;. There's only text. If your project has two &lt;code&gt;User&lt;/code&gt; classes — one in &lt;code&gt;pkg/auth&lt;/code&gt;, one in &lt;code&gt;pkg/billing&lt;/code&gt; — for RAG these are two pieces of text with similar embeddings. On retrieve, they &lt;strong&gt;mix together&lt;/strong&gt;, and the model confidently explains auth logic in the context of billing.&lt;/p&gt;

&lt;p&gt;This isn't theory. This is happening &lt;strong&gt;right now&lt;/strong&gt; in every code RAG agent.&lt;/p&gt;

&lt;p&gt;The fix — &lt;strong&gt;EntityFingerprint&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="n"&gt;symbol_name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="n"&gt;symbol_type&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="n"&gt;signature&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="n"&gt;language&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two &lt;code&gt;User&lt;/code&gt; entities in different files = two fingerprints = two distinct entities that &lt;strong&gt;never auto-merge&lt;/strong&gt;. When a new candidate arrives, a &lt;code&gt;SameEntityScore&lt;/code&gt; is computed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SameEntityScore =
  + 0.30 * name_similarity
  + 0.20 * alias_match
  + 0.20 * context_similarity
  + 0.15 * graph_neighborhood_similarity
  + 0.10 * temporal_consistency
  + 0.05 * source_consistency
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;≥ 0.92&lt;/code&gt; → &lt;code&gt;auto_merge&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;≥ 0.82&lt;/code&gt; → &lt;code&gt;same_as&lt;/code&gt; link (soft link, not merge)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;≥ 0.65&lt;/code&gt; → &lt;code&gt;ambiguous&lt;/code&gt; — an &lt;strong&gt;ambiguity record&lt;/strong&gt; is created, requiring human review&lt;/li&gt;
&lt;li&gt;otherwise — new entity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core rule: &lt;strong&gt;never merge entities at low confidence&lt;/strong&gt;. Better to create an ambiguity record and ask a human than to silently glue them together and lie forever after.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why all of this together
&lt;/h3&gt;

&lt;p&gt;I'm deliberately not framing this as &lt;em&gt;"this is nowhere done, I'm first."&lt;/em&gt; Each of the seven principles already exists. Atomic facts with lifecycle — in knowledge management systems. Strict mode + abstain — in last century's expert systems. Causal chains — in decision support tools. AST identity — in IDEs. Internal git — in tools like Pijul and in Datalog database experiments. Memory scoring — in research papers on episodic memory. Negative memory — in RL and reliability engineering.&lt;/p&gt;

&lt;p&gt;Uniqueness isn't in the ideas. It's in the &lt;strong&gt;assembly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you have atomic units but no strict mode — you have a structured database of hallucinations. If you have strict mode but no causal chains — you abstain without understanding why. If you have causal chains but no AST identity — your decisions point into the void after two refactorings. If you have all of the above but no memory scoring — you have a perfectly structured dump in which the important drowns in noise.&lt;/p&gt;

&lt;p&gt;Each property in isolation is an improvement. All seven together is a different category of product.&lt;/p&gt;

&lt;p&gt;This, by the way, is the answer to the question I get most often: &lt;em&gt;"why write something new if I already have Mem0/Letta/Zep?"&lt;/em&gt; The answer — look at their schemas and check how many of the seven principles are implemented &lt;strong&gt;not as a marketing claim, but as an enforced gate in code&lt;/strong&gt;. For most, the honest count is two or three. For some — four. They aren't bad products. They're &lt;strong&gt;partial solutions&lt;/strong&gt;, more honestly called &lt;em&gt;"structured retrieval"&lt;/em&gt; than &lt;em&gt;"memory."&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  In Part 3
&lt;/h3&gt;

&lt;p&gt;Seven principles is engineering. What &lt;strong&gt;should be&lt;/strong&gt; in the architecture. But behind engineering sits a deeper question: &lt;strong&gt;why should an AI agent know what it doesn't know?&lt;/strong&gt; Why abstain at all, if it can just answer?&lt;/p&gt;

&lt;p&gt;Part 3 is about the right of an AI agent to stay silent. About self-tasking. About why cognitive runtime matters more than model size. And about why the right metric for production AI isn't accuracy, but &lt;em&gt;zero confidently-wrong actions at an acceptable abstain rate&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It's the shortest and most philosophical piece in the series. Drops next week.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part 2 of 3. If you missed &lt;a href="https://medium.com/@vbcherepanov/rag-isnt-memory-it-s-ctrl-f-with-embeddings-c461b90ac7b1" rel="noopener noreferrer"&gt;Part 1 — here&lt;/a&gt; (on why RAG is search and not memory). If this resonated — a repost would help.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>RAG isn't memory. It's Ctrl+F with embeddings.</title>
      <dc:creator>Vitalii Cherepanov</dc:creator>
      <pubDate>Fri, 01 May 2026 12:12:41 +0000</pubDate>
      <link>https://dev.to/vbcherepanov/rag-isnt-memory-its-ctrlf-with-embeddings-1imi</link>
      <guid>https://dev.to/vbcherepanov/rag-isnt-memory-its-ctrlf-with-embeddings-1imi</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Part 1 of 3 — "Memory for AI agents"&lt;/strong&gt;&lt;br&gt;
Deconstructing the long-term memory myth in LLM systems&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp9xzcea3fsbcunjbvuu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp9xzcea3fsbcunjbvuu.png" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Article
&lt;/h2&gt;

&lt;p&gt;It's 3 AM. I'm on my third night debugging an AI agent. I'm standing in the kitchen with a mug of tea, staring at a diff, swearing quietly. The agent has confidently rewritten the auth function — based on a chunk that belongs to a branch that was deleted from the repo two months ago.&lt;/p&gt;

&lt;p&gt;The chunk lives in Qdrant. Its cosine similarity to my query is high. Top-1 in the retrieval. The agent honestly grabbed it, honestly stitched it into the prompt, honestly generated the "correct" patch. Against code from a different reality.&lt;/p&gt;

&lt;p&gt;I close the laptop and think: okay, I have RAG. I have vectors. I have long-term memory. I have everything every AI conference deck has been promising for the last two years. Why did my agent just propose a fix based on code that doesn't exist anymore?&lt;/p&gt;

&lt;p&gt;Because my agent doesn't have memory. My agent has search results with cosine instead of BM25. And between those two sentences lies the entire difference between &lt;em&gt;"AI you can trust in production"&lt;/em&gt; and &lt;em&gt;"AI you have to babysit on every line."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This piece is about that difference. And about why we, as engineers, are the ones to blame for not seeing it anymore.&lt;/p&gt;




&lt;h3&gt;
  
  
  The devaluation of the word "memory"
&lt;/h3&gt;

&lt;p&gt;Let's be honest. What is the typical "memory" of an AI agent in 2026?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;text → split into 512-1024 token chunks
     → embedding (bge / text-embedding-3 / openai)
     → vector DB (Qdrant / pgvector / Chroma / Pinecone)
     → cosine similarity top-k
     → concatenate into prompt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;not&lt;/strong&gt; memory. This is search. It's old-school Lucene from 2003, repainted in neural colors. Cosine instead of TF-IDF. Embeddings instead of an inverted index. Same thing.&lt;/p&gt;

&lt;p&gt;If we just called it that — &lt;em&gt;"vector search,"&lt;/em&gt; &lt;em&gt;"semantic retrieval"&lt;/em&gt; — I'd have no complaints. Call Lucene Lucene, no problem. But when it's sold under the banner &lt;em&gt;"my AI has long-term memory"&lt;/em&gt; — sorry. My AI has déjà vu and amnesia at the same time.&lt;/p&gt;

&lt;p&gt;This isn't a terminology gripe. It's a question of expectations. When an engineer hears &lt;em&gt;"memory,"&lt;/em&gt; they imagine a system that &lt;strong&gt;remembers&lt;/strong&gt;: who said what, when, in what context, what was true then versus what's true now. When an engineer gets RAG, they get Ctrl+F. And instead of building honest architecture around that Ctrl+F — with honest constraints — they build a sandcastle and wonder why the agent confuses past with present.&lt;/p&gt;




&lt;h3&gt;
  
  
  Three holes you can drive a truck through
&lt;/h3&gt;

&lt;p&gt;Three concrete failures. Each one I caught in production. Not theory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hole #1: A chunk doesn't know it's a chunk.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Take a perfectly normal declaration from a design doc:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"We moved to JWT because opaque sessions didn't scale to our traffic profile. The alternative was stateful sessions with a Redis cluster, but we ruled it out because of audit requirements from a customer — they don't allow session state outside their perimeter. JWT solves both, but adds invalidation complexity, which we mitigate with short TTLs and refresh tokens."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The chunker splits this into four 512-token pieces. On retrieval, a query comes in: &lt;em&gt;"why did we pick JWT?"&lt;/em&gt; Top-3 returns three fragments of the same decision. With no causality. Without the alternative we ruled out. Without the trade-off we accepted.&lt;/p&gt;

&lt;p&gt;A decision that was &lt;strong&gt;whole&lt;/strong&gt; turns into three parallel "factoids." The model honestly stitches them into plausible text — and &lt;strong&gt;invents&lt;/strong&gt; the missing connections. Because its job is to generate plausible text. And it will, without blinking.&lt;/p&gt;

&lt;p&gt;This isn't a bug in the chunker. This is an architectural property of the entire approach. Any decision declaration you have gets ground into powder and reassembled with structural loss. Every single time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hole #2: There's no structure in memory. Only cosine.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a human explains a project to you, they say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;here's the goal&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;here are the options we considered&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;here's what we picked and why&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;here's what broke two months later&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;here's what we changed, and that decision now supersedes the old one&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In RAG, none of this exists. Zero. RAG doesn't distinguish &lt;em&gt;"hypothesis,"&lt;/em&gt; &lt;em&gt;"confirmed fact,"&lt;/em&gt; &lt;em&gt;"rejected alternative,"&lt;/em&gt; &lt;em&gt;"deprecated decision moved to archive."&lt;/em&gt; For RAG, all of these are equivalent points in a 384-dimensional space.&lt;/p&gt;

&lt;p&gt;Imagine you're trying to record thirty years of life into a single flat table &lt;code&gt;entries(text, vector)&lt;/code&gt; and then search it by cosine. Surprised your memories blur together? That's not your memory failing. That's the structure you crammed it into — a structure that doesn't allow distinctions between &lt;em&gt;"I thought about it"&lt;/em&gt; and &lt;em&gt;"I did it,"&lt;/em&gt; between &lt;em&gt;"I tried it and it worked"&lt;/em&gt; and &lt;em&gt;"I tried it and it hurt."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In RAG, there are no fields for these distinctions. Not because the developers didn't think of it. Because &lt;strong&gt;the vector-plus-distance paradigm itself&lt;/strong&gt; doesn't accommodate causality and time. It's a mathematical limitation. You don't fix it with product features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hole #3: Time doesn't exist as a first-class concept.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three weeks ago I wrote into the agent's memory: &lt;em&gt;"we use Postgres."&lt;/em&gt; Today I wrote: &lt;em&gt;"we migrated to ClickHouse for analytics, Postgres is OLTP only now."&lt;/em&gt; In RAG, &lt;strong&gt;both&lt;/strong&gt; facts sit there. Both have high cosine to a database query. Top-k returns both. The model picks the one that "sounds" better in its pretraining — usually Postgres, because it appears more often in the training data.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;not&lt;/strong&gt; memory. This is a roulette wheel disguised as confidence.&lt;/p&gt;

&lt;p&gt;When was the last time you saw &lt;code&gt;valid_from&lt;/code&gt;, &lt;code&gt;valid_until&lt;/code&gt;, &lt;code&gt;deprecated_by&lt;/code&gt;, &lt;code&gt;replaced_by&lt;/code&gt;, &lt;code&gt;superseded_by&lt;/code&gt; fields in a production RAG system? I never have. Because in standard RAG, they're &lt;strong&gt;not in the schema&lt;/strong&gt;. And again — not because devs are lazy. Because the schema &lt;em&gt;"text plus embedding"&lt;/em&gt; has no place for the lifecycle of knowledge. No notion of &lt;em&gt;"this is true now"&lt;/em&gt; versus &lt;em&gt;"this was true then."&lt;/em&gt; Everything collapses into a single time slice — a present that somehow contains yesterday, last year, and deprecated-three-quarters-ago all at once.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Ctrl+F with embeddings doesn't &lt;strong&gt;remember&lt;/strong&gt;. It &lt;strong&gt;finds&lt;/strong&gt;. Different verbs.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  "But memory frameworks fix this, right?"
&lt;/h3&gt;

&lt;p&gt;Okay, the believer says. There's mem0, Letta, Zep, Cognee, MemGPT, the whole long-term memory zoo. They added a meaning layer on top of RAG. They're memory-aware.&lt;/p&gt;

&lt;p&gt;Let's be honest. I've used them. One after another. For a long time. Looked under the hood, not just at the landing pages.&lt;/p&gt;

&lt;p&gt;Each of them takes &lt;strong&gt;one&lt;/strong&gt; piece of real memory — for some it's LLM-extraction before write, for some it's a buffer hierarchy like an OS, for some it's post-hoc graph extraction from dialogues, for some it's per-fact temporal validity — and implements &lt;strong&gt;that one piece&lt;/strong&gt;, without weaving it into the rest.&lt;/p&gt;

&lt;p&gt;This is warmer than vanilla Qdrant. It's &lt;strong&gt;not&lt;/strong&gt; a solution.&lt;/p&gt;

&lt;p&gt;Because real memory requires &lt;strong&gt;seven&lt;/strong&gt; properties working together. Each of them, in isolation, already exists in the literature or in open source. As far as I can tell, no one has assembled all seven into a single system. Which seven, exactly — that's part 2 of this series. Here, only the limitation that unites &lt;strong&gt;all&lt;/strong&gt; flat-fact solutions, however they wrap themselves:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;None of them have the right to say "I don't know."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Show me any one of these systems with a formal abstain mechanism: a gate through which a fact will &lt;strong&gt;not&lt;/strong&gt; pass into prompt context if it has no source, no confidence, no temporal validity, or an unresolved contradiction. I'll wait.&lt;/p&gt;

&lt;p&gt;In the standard flow of all these frameworks, the system's response to &lt;em&gt;"there's a contradiction in memory or not enough data"&lt;/em&gt; is &lt;em&gt;"well, the model will figure it out."&lt;/em&gt; Which translates from marketing to engineering as &lt;em&gt;"the model will hallucinate, and that becomes your problem in production."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Good memory isn't &lt;em&gt;"remembering a lot."&lt;/em&gt; It's &lt;strong&gt;knowing the boundary of what you don't remember&lt;/strong&gt;. Part 2 of this series is built around that thesis.&lt;/p&gt;




&lt;h3&gt;
  
  
  "Why not just push context to 1M tokens?"
&lt;/h3&gt;

&lt;p&gt;This is the second fashion of the last two years, and it deserves its own breakdown, because it leads the industry into the same dead end under a different banner. &lt;em&gt;"Why do we need memory if Gemini has 2M context, Claude has 1M?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Four problems, no preamble.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One — economics.&lt;/strong&gt; A single project conversation at 800K tokens with prompt caching off costs tens of dollars &lt;strong&gt;per request&lt;/strong&gt;. Without aggressive caching, you're broke in a week. With aggressive caching, you're building exactly the same hierarchy as Letta — just more expensive and locked to one vendor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two — recall.&lt;/strong&gt; Every long-context benchmark (NIH, Ruler, LongMemEval) shows the same thing: models &lt;strong&gt;drown&lt;/strong&gt; in their own context past 200-300K tokens. Attention is unevenly distributed. This is &lt;strong&gt;lost-in-the-middle&lt;/strong&gt;, and it doesn't get fixed by window size — it gets partially mitigated by architectural tricks inside the model, but it doesn't go away. The more you stuff in, the less of it actually gets considered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three — persistence.&lt;/strong&gt; Context isn't saved. Close the session, gone. Tomorrow the same agent shows up with a clean context. So you have to feed it 800K tokens of "history" again. The problem isn't solved — it's hidden inside your wallet and your latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Four — learning.&lt;/strong&gt; If the agent made a mistake yesterday and you corrected it, that experience isn't structured for the future. Tomorrow it'll repeat the mistake. Context is RAM, not disk. And when someone says &lt;em&gt;"just increase context instead of building memory"&lt;/em&gt; — that's the same as saying &lt;em&gt;"why do I need a database, I have a terabyte of RAM."&lt;/em&gt; Technically the words rhyme. In practice they're incomparable concepts.&lt;/p&gt;

&lt;p&gt;Big context doesn't replace memory. It lets you stuff more into one session — and that's it.&lt;/p&gt;




&lt;h3&gt;
  
  
  What to do about it tomorrow morning
&lt;/h3&gt;

&lt;p&gt;If you've read this far and you're thinking &lt;em&gt;"okay, agreed, RAG is search, not memory. Now what?"&lt;/em&gt; — I have two pieces of news.&lt;/p&gt;

&lt;p&gt;The bad: a systemically correct solution requires rewriting the memory layer from schema up through lifecycle, and that's months of work. Not a weekend.&lt;/p&gt;

&lt;p&gt;The good: there are several things you can do &lt;strong&gt;tomorrow morning&lt;/strong&gt; that already remove half the pain. Not magic — just engineering hygiene.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Drop the word "memory" from your stack if what you have is RAG.&lt;/strong&gt; Call it retrieval or search — instantly more honest. That alone removes 80% of inflated expectations from users and the team.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Introduce &lt;code&gt;valid_from&lt;/code&gt; and &lt;code&gt;valid_until&lt;/code&gt; for every fact.&lt;/strong&gt; Any fact without temporal validity is a hypothesis, not a fact. Old facts should drop out of retrieval automatically, not compete with new ones on cosine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distinguish &lt;code&gt;staging&lt;/code&gt;, &lt;code&gt;working&lt;/code&gt;, &lt;code&gt;consolidated&lt;/code&gt;, &lt;code&gt;archived&lt;/code&gt;.&lt;/strong&gt; Don't dump everything into one collection. A fact that just arrived and a piece of knowledge confirmed by tests are different entities with different weight in retrieval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make abstain a first-class outcome.&lt;/strong&gt; If no fact passed the confidence threshold during retrieve, the system &lt;strong&gt;must&lt;/strong&gt; have the right to say &lt;em&gt;"I don't know, I need data."&lt;/em&gt; And that &lt;em&gt;"I don't know"&lt;/em&gt; should become a task in the backlog, not a dead end for the user.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't a complete list — it's the minimum to start the transition from &lt;em&gt;"I have RAG, I call it memory"&lt;/em&gt; to &lt;em&gt;"I have memory, and it knows its boundaries."&lt;/em&gt; The full list of seven principles is in part 2.&lt;/p&gt;




&lt;h3&gt;
  
  
  Where this comes from
&lt;/h3&gt;

&lt;p&gt;I sit deep in this kitchen — Claude Code, Cursor, Codex, Windsurf, MCP servers, mem0, Zep, local RAG stacks on Postgres + pgvector, Qdrant, Chroma. Over the last few months I've tried, I think, everything on the market. I have my own MCP memory server with about fifteen hundred entries, which I rewrote from scratch three times because each time I hit one of the three holes above.&lt;/p&gt;

&lt;p&gt;At some point, I got tired. Not of AI — of what we call memory at AI. Sat down and started writing my own cognitive runtime that &lt;strong&gt;doesn't pretend to know&lt;/strong&gt;, that &lt;strong&gt;knows what it doesn't know&lt;/strong&gt;, and that &lt;strong&gt;sets its own tasks&lt;/strong&gt; to close the gaps. Called it &lt;code&gt;braincore&lt;/code&gt;. One Go binary, local, MCP-stdio, Apache-2.0. Not a pitch, because it's open source — just an example that I say &lt;em&gt;"this can be done"&lt;/em&gt; not theoretically.&lt;/p&gt;

&lt;p&gt;Seven architectural principles it's built on — that's part 2 of this series. Drops in a week. I'll cover atomic knowledge units, lifecycle, strict mode, causal decision chains, AST-based identity for code, internal git as memory versioning, memory scoring, and negative memory.&lt;/p&gt;

&lt;p&gt;And why all of that combined produces a qualitatively different result than any of those pieces in isolation.&lt;/p&gt;

&lt;p&gt;Part 3 is philosophical — about &lt;strong&gt;the right of an AI agent to stay silent&lt;/strong&gt;, and why the right metric for production AI isn't accuracy but &lt;em&gt;zero confidently-wrong actions at an acceptable abstain rate&lt;/em&gt;. About self-tasking. About why cognitive runtime matters more than model size.&lt;/p&gt;




&lt;p&gt;If you read this far and recognized yourself in the opening paragraph — we're in the same boat. If you have RAG that you call memory and it works — tell me how, seriously, I want to know, I might be wrong.&lt;/p&gt;

&lt;p&gt;The one thing you can't do is stay silent.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part 1 of 3. Next — "Seven principles of real memory for AI agents" — drops next Tuesday.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>I Studied the etcd Codebase — and It Changed How I Write PHP</title>
      <dc:creator>Vitalii Cherepanov</dc:creator>
      <pubDate>Tue, 21 Apr 2026 10:41:13 +0000</pubDate>
      <link>https://dev.to/vbcherepanov/i-studied-the-etcd-codebase-and-it-changed-how-i-write-php-36m1</link>
      <guid>https://dev.to/vbcherepanov/i-studied-the-etcd-codebase-and-it-changed-how-i-write-php-36m1</guid>
      <description>&lt;p&gt;There's a common piece of advice: "Want to write better code? Read good code." Sounds obvious. Rarely practiced.&lt;/p&gt;

&lt;p&gt;The problem is that most open-source projects are mazes. You open a repo, see 200 directories, and close the tab. Kubernetes is two million lines. The Linux kernel — don't even think about it. Where do you start?&lt;/p&gt;

&lt;p&gt;My answer: &lt;strong&gt;etcd&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For those unfamiliar: etcd is a distributed key-value store written in Go. It's the backbone of Kubernetes — every piece of cluster state lives there. But I'm not interested in etcd as a product. I'm interested in it as &lt;strong&gt;an example of architecture you can actually read from start to finish&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's what surprised me: the principles baked into etcd aren't about Go. They're about software design in general. I work with PHP and Symfony daily, and almost everything I found in etcd translated directly into my projects.&lt;/p&gt;

&lt;p&gt;Seven principles, concrete examples, no fluff.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. One Source of Truth for Your API
&lt;/h2&gt;

&lt;p&gt;In etcd, every API is defined in &lt;code&gt;.proto&lt;/code&gt; files. Open &lt;code&gt;rpc.proto&lt;/code&gt; and you see all operations: &lt;code&gt;Range&lt;/code&gt;, &lt;code&gt;Put&lt;/code&gt;, &lt;code&gt;DeleteRange&lt;/code&gt;, &lt;code&gt;Txn&lt;/code&gt;. Every field is typed. There's no room for "wait, do we accept a string or an integer here?"&lt;/p&gt;

&lt;p&gt;In PHP, instead of protobuf, we have &lt;strong&gt;strictly typed DTOs&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;final&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CreateOrderRequest&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$customerId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="cd"&gt;/** @var OrderItemDto[] */&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$promoCode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One class — and everyone knows what the endpoint accepts. The frontend dev looks at the DTO, the backend dev writes logic against it, the OpenAPI schema generates automatically via NelmioApiDocBundle.&lt;/p&gt;

&lt;p&gt;Compare this with what I've seen (and written) on real projects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;json_decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getContent&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;$customerId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'customer_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nv"&gt;$items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'items'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="c1"&gt;// What's the format of items? Is promoCode a thing? Who knows.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When your contract is "well, some array comes in," any change breaks something unexpected. When your contract is a DTO with types, PHPStan catches the problem before production does.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Each Service Does One Thing
&lt;/h2&gt;

&lt;p&gt;etcd has clearly separated gRPC services: &lt;code&gt;KV&lt;/code&gt; (read-write), &lt;code&gt;Watch&lt;/code&gt; (subscribe to changes), &lt;code&gt;Lease&lt;/code&gt; (key TTLs), &lt;code&gt;Auth&lt;/code&gt; (authorization). Each one is a separate interface. &lt;code&gt;Watch&lt;/code&gt; doesn't touch writes. &lt;code&gt;KV&lt;/code&gt; doesn't check tokens.&lt;/p&gt;

&lt;p&gt;In Symfony — same idea, different tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderController&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[Route('/orders', methods: ['POST'])]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="kt"&gt;CreateOrderRequest&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;OrderService&lt;/span&gt; &lt;span class="nv"&gt;$orderService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;JsonResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;$orderService&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;OrderService&lt;/code&gt; creates orders. It doesn't send emails — that's &lt;code&gt;NotificationService&lt;/code&gt; listening to an &lt;code&gt;OrderCreatedEvent&lt;/code&gt;. It doesn't process payments — that's &lt;code&gt;PaymentService&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And then there's the alternative I see regularly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderController&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// 40 lines of validation&lt;/span&gt;
        &lt;span class="c1"&gt;// 20 lines of authorization&lt;/span&gt;
        &lt;span class="c1"&gt;// 60 lines of business logic&lt;/span&gt;
        &lt;span class="c1"&gt;// 15 lines sending email&lt;/span&gt;
        &lt;span class="c1"&gt;// 10 lines of logging&lt;/span&gt;
        &lt;span class="c1"&gt;// Total: 150 lines, untestable&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 500-line god controller. We've all been there. etcd helped me finally articulate &lt;em&gt;why&lt;/em&gt; it's bad: not because "the pattern is wrong," but because &lt;strong&gt;you can't trace what the system is doing&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Middleware Composes Like Lego
&lt;/h2&gt;

&lt;p&gt;Every gRPC request in etcd passes through a chain of interceptors: logging → auth → metrics → handler → metrics → response. Each interceptor is small, single-purpose. The power comes from composition.&lt;/p&gt;

&lt;p&gt;In Symfony, this maps to Event Listeners and Messenger Middleware:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MetricsMiddleware&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;MiddlewareInterface&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;PrometheusCollector&lt;/span&gt; &lt;span class="nv"&gt;$metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Envelope&lt;/span&gt; &lt;span class="nv"&gt;$envelope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;StackInterface&lt;/span&gt; &lt;span class="nv"&gt;$stack&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Envelope&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;microtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$stack&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$envelope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$stack&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'messages_processed_total'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$envelope&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\Throwable&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'messages_processed_total'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$envelope&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s1"&gt;'message_duration_seconds'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nb"&gt;microtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nv"&gt;$start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$envelope&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One middleware, one job. Metrics here, logging there, retry somewhere else. Assemble the chain in &lt;code&gt;messenger.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The antipattern — when every handler has this manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;CreateOrderCommand&lt;/span&gt; &lt;span class="nv"&gt;$command&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Starting order creation...'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;microtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// ... actual logic ...&lt;/span&gt;

    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;microtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nv"&gt;$start&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Order created'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;50 handlers, 50 copies of the same boilerplate. Forget one — no metrics. Change the log format — change it in 50 places.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Observability Is Architecture, Not an Afterthought
&lt;/h2&gt;

&lt;p&gt;In etcd, Prometheus is wired into the gRPC layer from day one. Not "added six months after launch." The code isn't considered done without metrics.&lt;/p&gt;

&lt;p&gt;In PHP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PaymentService&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Order&lt;/span&gt; &lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;PaymentResult&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$timer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;startTimer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'payment_charge_duration'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;gateway&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'payments_total'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'provider'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;isSuccess&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'success'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'declined'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;GatewayTimeoutException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'payments_total'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'provider'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;paymentMethod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'timeout'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$timer&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every payment — in metrics. How many succeeded, how many timed out, which provider is slow. Not because someone asked for it, but because without it you're flying blind.&lt;/p&gt;

&lt;p&gt;I remember a project where production was down for 40 minutes and the only way to understand what was happening was &lt;code&gt;tail -f /var/log/symfony.log | grep ERROR&lt;/code&gt;. Never again.&lt;/p&gt;

&lt;p&gt;Package: &lt;code&gt;promphp/prometheus_client_php&lt;/code&gt;. Five minutes to install, fifteen to wire up Grafana.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Simple Outside, Rocket Science Inside
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;clientv3&lt;/code&gt; in etcd is a masterclass in the facade pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line. Under the hood: node selection, reconnection on failure, retry with exponential backoff, protobuf serialization, Raft consensus, disk write, quorum confirmation.&lt;/p&gt;

&lt;p&gt;Same principle in PHP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Calling code. Simple and clear.&lt;/span&gt;
&lt;span class="nv"&gt;$paymentService&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside &lt;code&gt;charge()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Order&lt;/span&gt; &lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;PaymentResult&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;findExistingPayment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$existing&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// idempotency&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nv"&gt;$provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;providerResolver&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$provider&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;maxAttempts&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'exponential'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;isSuccess&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;fiscalService&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createReceipt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PaymentProcessed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$order&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The controller calling &lt;code&gt;charge()&lt;/code&gt; knows nothing about fiscal receipts, retries, or provider selection. And it shouldn't.&lt;/p&gt;

&lt;p&gt;A sign of a good service: you can explain what it does in one sentence — "charges the customer for an order" — while the implementation is 200 lines of careful logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. You Can Trace a Request With Your Finger
&lt;/h2&gt;

&lt;p&gt;In etcd, the request path reads linearly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gRPC handler → EtcdServer.Put() → Raft → apply → bbolt (disk)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No magic. No hidden calls. No "where does this even get triggered?"&lt;/p&gt;

&lt;p&gt;In Symfony — same thing, if you don't abuse the event system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request
  → Controller (unwrap DTO)
    → Service (business logic)
      → Repository (database)
      → EventDispatcher (side effects)
  → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the controller — see which service is called. Open the service — see what it does. Open the repository — see the query.&lt;/p&gt;

&lt;p&gt;What kills traceability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;@PostPersist&lt;/code&gt; on an entity that silently sends SMS&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prePersist&lt;/code&gt; listeners modifying data before writes — and you spend 30 minutes figuring out who's touching the &lt;code&gt;updatedAt&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;Ten &lt;code&gt;EventSubscriber&lt;/code&gt;s on the same event with unclear execution order
Event-driven is great. But if a new developer can't explain "request comes in here, response goes out there" within 2 minutes — you have a problem.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  7. No Hidden Dependencies
&lt;/h2&gt;

&lt;p&gt;In etcd, all dependencies are passed explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewKVServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;EtcdServer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;KVServer&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the constructor — see everything the class needs.&lt;/p&gt;

&lt;p&gt;In Symfony — constructor injection, same thing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderService&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;OrderRepository&lt;/span&gt; &lt;span class="nv"&gt;$orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;PaymentGateway&lt;/span&gt; &lt;span class="nv"&gt;$payment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;EventDispatcherInterface&lt;/span&gt; &lt;span class="nv"&gt;$events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;LoggerInterface&lt;/span&gt; &lt;span class="nv"&gt;$logger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four dependencies. All visible. Want to test? Swap in mocks. Want to understand the class? Look at the constructor.&lt;/p&gt;

&lt;p&gt;Antipatterns that still survive in the wild:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Service locator: where did this come from?&lt;/span&gt;
&lt;span class="nv"&gt;$payment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'payment.gateway'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Static calls: untestable&lt;/span&gt;
&lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'key'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// new SomeService() inside another service: invisible coupling&lt;/span&gt;
&lt;span class="nv"&gt;$validator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OrderValidator&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Symfony's autowiring isn't magic in the bad sense. The container wires dependencies by type, but you still see them in the constructor. It's convenience, not hidden behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Checklist
&lt;/h2&gt;

&lt;p&gt;After studying etcd, I distilled a checklist I now apply to every new service:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Contract defined?&lt;/strong&gt; DTOs exist, types are set, OpenAPI generates from them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controller thin?&lt;/strong&gt; 10 lines max, all logic in the service layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-cutting concerns extracted?&lt;/strong&gt; Logging, metrics, retry — through middleware, not copy-paste&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics present?&lt;/strong&gt; If not, the service isn't production-ready&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple API externally?&lt;/strong&gt; Calling code doesn't know about internal complexity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request path traceable?&lt;/strong&gt; A new developer finds the handler in 2 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependencies explicit?&lt;/strong&gt; Everything in the constructor, nothing from thin air
None of this is revolutionary. It's basic hygiene that's easy to forget under deadline pressure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;etcd just reminded me what a codebase looks like when that hygiene wasn't skipped. And that it's possible even in a large production system.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What open-source codebase changed how you write code? I'd love to build a reading list — drop yours in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>learning</category>
      <category>opensource</category>
      <category>php</category>
    </item>
  </channel>
</rss>
