<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: OceanData4AI</title>
    <description>The latest articles on DEV Community by OceanData4AI (oceandata4ai).</description>
    <link>https://dev.to/oceandata4ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F12456%2Fd4ff17ac-8745-44c3-9f58-5fe52ae9aee1.png</url>
      <title>DEV Community: OceanData4AI</title>
      <link>https://dev.to/oceandata4ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oceandata4ai"/>
    <language>en</language>
    <item>
      <title>Your OpenClaw Bill Is Bleeding Tokens. Here’s What We Measured — and How to Fix It</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 28 Jun 2026 13:57:43 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/your-openclaw-bill-is-bleeding-tokens-heres-what-we-measured-and-how-to-fix-it-1mee</link>
      <guid>https://dev.to/oceandata4ai/your-openclaw-bill-is-bleeding-tokens-heres-what-we-measured-and-how-to-fix-it-1mee</guid>
      <description>&lt;p&gt;&lt;em&gt;Memory bloat, compaction loss, and a retrieval-first path: ~32% less token spend on the AppWorld dev split — without dumbing the agent down.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F83qnbjydq1wdqktebtj3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F83qnbjydq1wdqktebtj3.webp" alt=" " width="800" height="533"&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@micheile" rel="noopener noreferrer"&gt;micheile henderson&lt;/a&gt; on &lt;a href="https://unsplash.com/" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Developers who actually ship with LLMs know one truth by heart: the context window is not free. Every extra thousand tokens nudges the invoice up and the latency out.&lt;/p&gt;

&lt;p&gt;If you run OpenClaw (an agent stack that leans hard on long-horizon sessions), that anxiety gets concrete fast. Picture this: last week you spent two hours with your agent debugging production — logs, configs, experiments — and burned through 30k tokens of back-and-forth. This week you pick up where you left off, and the agent answers: Hi! Which refactor are we talking about?&lt;/p&gt;

&lt;p&gt;So you spend a few thousand tokens re-explaining context. The model spends a few thousand more re-understanding. And you still might not land the same mental model you had last Tuesday.&lt;/p&gt;

&lt;p&gt;Those 30k tokens? Mostly gone.&lt;/p&gt;

&lt;p&gt;That is not a one-off glitch. OpenClaw’s default memory story quietly feeds two token black holes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two black holes that blow up your token budget
&lt;/h2&gt;

&lt;h2&gt;
  
  
  1) The more you remember, the more you pay
&lt;/h2&gt;

&lt;p&gt;OpenClaw’s agent writes important state into MEMORY.md, and that file gets fully injected into the system prompt on every request. The longer you use the setup, the larger MEMORY.md grows—and every API call pays for the whole thing as input tokens.&lt;/p&gt;

&lt;p&gt;Bootstrap caps exist (for example, a 20k-character default per file, 150k total), but long before you hit the ceiling, a bloated prompt starts crowding the model’s working space. OpenClaw’s agent knows information can get lost — so it writes even more aggressively into MEMORY.md, which accelerates the bloat.&lt;/p&gt;

&lt;h2&gt;
  
  
  2) The more you forget, the more you burn tokens fixing mistakes
&lt;/h2&gt;

&lt;p&gt;When sessions get long, OpenClaw leans on two mechanisms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Compaction: OpenClaw asks an LLM to summarize older conversation chunks to free context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Memory flush: before compaction, OpenClaw spins up an embedded agent to decide what to persist into memory/YYYY-MM-DD.md.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But compaction is lossy compression by design, and OpenClaw’s retrieval-side slicing hard-cuts along line and character budgets (by default, 400 tokens per chunk) without respecting semantic boundaries. Important context can get cut mid-thought, recall quality drops, your agent makes mistakes, you rework, rework creates more chat, and you trigger compaction again sooner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool calls are an accelerant
&lt;/h2&gt;

&lt;p&gt;Tool outputs — web_fetch pages, exec dumps—can be huge per message—up to 400k characters per tool result in the worst case. That fills sessions fast. Those intermediates usually should not land in MEMORY.md, but they can still contain value you do not want to discard. Either way, tool-heavy runs tighten the doom loop.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The uncomfortable tradeoff: remembering everything gets expensive; forgetting costs correctness. You need a third path.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  A third path: cloud memory that steers tokens instead of hoarding them
&lt;/h2&gt;

&lt;p&gt;seekdb M0 is a cloud memory plugin for OpenClaw. The idea in one sentence:&lt;/p&gt;

&lt;p&gt;Do not dump all memory into the system prompt. Before each turn, retrieve only the memory slices that match the current topic — and inject just those.&lt;/p&gt;

&lt;p&gt;Unlike loading the full MEMORY.md on every request, M0 stores memory as discrete facts in a cloud database, with vector embeddings and full-text indexes. At conversation start, M0 runs hybrid retrieval (BM25 keyword scoring + vector similarity) and injects the top relevant facts. After each chat, M0 extracts new facts from the dialogue, compares them to what already exists, and decides whether to add, update, or skip.&lt;/p&gt;

&lt;p&gt;What that buys you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MEMORY.md stops ballooning—durable memory lives outside the always-on system prompt, so input tokens drop.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Session resets stop being catastrophic — memory persists and rehydrates without you paying again to restate context you already gave.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cross-device continuity — your memory is not trapped on one laptop.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most users, this is meant to feel invisible: you talk; M0 manages memory in the background.&lt;/p&gt;

&lt;p&gt;OpenClaw’s native persistence tends to route through compaction over the full session (including tool outputs) and a flush agent that decides what to write — both are comparatively heavy and lossy. M0 splits what to store from how to store it into two phases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1: fact extraction
&lt;/h2&gt;

&lt;p&gt;After a conversation, M0 extracts facts from user ↔ assistant text only — not from tool-call intermediates — and uses an LLM to produce atomic facts.&lt;/p&gt;

&lt;p&gt;Example: The user is Alex, a database engineer based in Austin. becomes three independent facts.&lt;/p&gt;

&lt;p&gt;Hard rules we enforce during extraction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Preserve time information (do not collapse went to Hawaii last year into a timeless went to Hawaii).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keep the original language (no automatic translation during extraction).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Do not extract sensitive information.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 2: memory decisions
&lt;/h2&gt;

&lt;p&gt;M0 does not blindly insert facts. M0 retrieves similar existing memories, then asks an LLM whether the new fact should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;ADD&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;UPDATE&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DELETE (contradictions)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NONE (already covered)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, M0 treats DELETE conservatively as NONE for auto-capture — M0 only adds and updates existing memories and does not proactively delete them, to reduce accidental erasure.&lt;/p&gt;

&lt;p&gt;Example decisions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;New fact: "Went to Hawaii last May."
Existing memory: "Has been to Hawaii."
→ UPDATE (time detail added)

New fact: "Doesn't like pizza anymore."
Existing memory: "Likes pizza."
→ UPDATE (preference changed)

New fact: "Is a database engineer."
Existing memories: "Name is Alex" + "Is a database engineer."
→ NONE (already covered)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implementation detail worth noting: in the memory-decision LLM call, each existing memory’s original ID is replaced with a short temporary index (0, 1, 2, …) so the decision model is less likely to hallucinate or garble long integer IDs. If the decision model returns an index that cannot be mapped back, M0 gracefully falls back to treating the fact as new.&lt;/p&gt;

&lt;p&gt;Why this matters for tokens: M0’s fact-extraction stage ignores tool transcripts, so you avoid paying an LLM to read 400k-character blobs just to mint memories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool-result compression: deterministic, zero LLM spend
&lt;/h2&gt;

&lt;p&gt;M0 also attacks session inflation at persistence time. When OpenClaw persists tool results to session history, M0’s tool_result_persist hook replaces raw output with a structured summary—rule-based, no LLM tokens.&lt;/p&gt;

&lt;p&gt;Illustrative shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Raw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;curl returned a 3,000-line JSON payload&lt;/span&gt;

&lt;span class="na"&gt;Compressed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web_fetch&lt;/span&gt;
  &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3,000 lines / 48K characters&lt;/span&gt;
  &lt;span class="na"&gt;preview&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users"&lt;/span&gt;&lt;span class="pi"&gt;:[{&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id"&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name"&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice"&lt;/span&gt;&lt;span class="nv"&gt;... (300 chars)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;M0’s summaries are not about perfect fidelity. They aim for high compression while preserving what happened, whether the tool succeeded, and a short preview.&lt;/p&gt;

&lt;p&gt;Compared with OpenClaw’s native compaction, which feeds the entire session (including tool dumps) into a summarizer, M0’s hook-based compression is closer to upstream budgeting: you control what enters the LLM pipeline, instead of waiting until you overflow and then compressing reactively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Experience + Skill: spend tokens on the right kind of reuse
&lt;/h2&gt;

&lt;p&gt;M0’s memory layer answers who this user is and what they care about. Another common waste pattern in agent stacks is different:&lt;/p&gt;

&lt;p&gt;Your OpenClaw agent may have skills, but not durable, reusable know-how distilled from real runs — so every similar task becomes another expensive exploration loop.&lt;/p&gt;

&lt;p&gt;M0 splits playbooks into two layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Experience (strategy layer): a tight summary of approach + key cautions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Skill (operations layer): structured steps, prerequisites, and pitfalls.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The two layers link by reference: your OpenClaw agent can pull strategy first, then expand operational detail only when needed — which helps keep the active prompt compact.&lt;/p&gt;

&lt;p&gt;Under the hood, M0 stores these in OceanBase (a distributed SQL database) with separate tables for Experience and Skill, indexing title and description with both vector and full-text indexes. Retrieval runs four parallel signals — title vector, description vector, title full-text, description full-text — then merges with RRF (Reciprocal Rank Fusion).&lt;/p&gt;

&lt;p&gt;Why four channels? In M0’s retrieval stack, title matching helps lock onto the right name, description matching helps lock onto the right content, vectors help with semantic equivalence (for example, build a playlist vs create a playlist), and full-text tends to win on exact strings like API names and error codes. That complementary mix is meant to make retrieval both accurate and broad: your OpenClaw agent should not need ten mid-confidence hits (think ~0.6 relevance) just to be safe, when three high-confidence items (~0.9) are enough to execute — and that gap maps straight to fewer tokens in the prompt.&lt;/p&gt;

&lt;p&gt;M0 also stages knowledge ingestion: M0’s pipeline detects a procedure in traces → structures a Skill (steps / pitfalls / prerequisites) → dedupes (for example, vector similarity &amp;gt; 0.75 merges) → runs moderation → stores. When M0 extracts Experience records, M0’s extractor can see stored skills and reference skill IDs, which keeps links generated rather than hand-maintained.&lt;/p&gt;

&lt;h2&gt;
  
  
  AppWorld numbers: how much did we actually save?
&lt;/h2&gt;

&lt;p&gt;Early on, we used LoCoMo to probe memory behavior, but found it skews toward chit-chat agents rather than work agents like OpenClaw — where evaluation is harder (skills, multi-step reasoning, structured API payloads).&lt;/p&gt;

&lt;p&gt;For a fairer workload, we switched to the AppWorld benchmark — a suite of 750 autonomous agent tasks framed as realistic, stateful challenges. In short, AppWorld’s evaluation is built around state-based unit tests: an agent can complete tasks in different ways, and AppWorld’s harness still checks for unintended harm during the run.&lt;/p&gt;

&lt;p&gt;The AppWorld benchmark paper (ACL 2024 resource paper, &lt;a href="https://arxiv.org/abs/2407.18901" rel="noopener noreferrer"&gt;arXiv:2407.18901&lt;/a&gt;) states in the abstract:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The state-of-the-art LLM, GPT4O, solves only ~49% of our ‘normal’ tasks and ~30% of ‘challenge’ tasks, while other models solve at least 16% fewer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AppWorld blog (appworld.dev) puts it plainly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Even the best LLM, GPT-4o, performs quite poorly. E.g., it completes only ~30% of the tasks in the challenge test set correctly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In our controlled setup on AppWorld dev (54 tasks, 15-step cap, no pre-loaded distilled skills), GPT-4o’s baseline was ~24% (13/54 solved) — below the headline pass rates quoted in AppWorld’s public materials, which reflect a different task mix and evaluation harness than this stripped-down run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Controlled comparison on AppWorld dev (54 tasks, 15-step cap)
&lt;/h2&gt;

&lt;p&gt;Our setup: we ran traces with Hermes + Qwen 3.6-plus (34/54 solved, 63%), kept all 54 trajectories, then distilled into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;M0 path: 85 experiences (with skill_refs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hermes path: 44 SKILL.md files&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then we evaluated GPT-4o on each distilled knowledge base. Only two knobs differ: distillation + storage/retrieval.&lt;/p&gt;

&lt;p&gt;Results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4dx69adxloon0j8ioben.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4dx69adxloon0j8ioben.webp" alt=" " width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: pp = percentage points (absolute change in pass rate, not relative % change).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Headline takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;M0 net +8 tasks (examples mentioned: Spotify-style flows, cross-app tasks, Venmo-style flows), with some wins traded for losses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hermes net -1 on GPT-4o in this setup — no positive gain versus baseline.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why M0 beat file-skill matching in our analysis&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Retrieval precision: M0’s vector search can match the task description semantically; Hermes’ filename/tag matching does not understand semantics the same way, so Hermes misses paraphrases. Example (localized for a global audience): Create a Beyoncé playlist vs Bundle twenty Taylor Swift tracks together should route to the same underlying skill — M0’s vectors tolerate wording drift better than brittle naming.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Context hygiene: M0’s Experience records stay light (title-line scale); Hermes’ SKILL.md files can read like full manuals and crowd the model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On-demand expansion + dedupe: M0 uses skill_refs to load operational detail only when needed, and M0 performs semantic deduplication by pairing vector-similarity checks with an LLM merge so near-duplicate skills fold together instead of piling up. Hermes may inject all matching skills at once, and collisions among Hermes’ SKILL.md filenames can overwrite useful variants.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Efficiency (same GPT-4o runs as the table): average steps 9.5 → 6.2 (-35%), tokens 2.56M → 1.74M (-32%). Even failures become cheaper failures — less thrash, less exploration tax.&lt;/p&gt;

&lt;h2&gt;
  
  
  Teach once with a strong model, run forever with a cheaper one
&lt;/h2&gt;

&lt;p&gt;Rough cost sketch (our pricing assumptions — not a live vendor quote):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPT-5.4 one full pass: ~$57.6 at $22.5 / 1M tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GPT-4o baseline: 2.56M tokens → ~$25.6 at $10 / 1M&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GPT-4o + M0 distilled experience: 1.74M tokens → ~$17.4 at $10 / 1M&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Note (GPT-5.4 line, illustrative): Blended $/M on ~2.56M tokens in our draft; not a literal line item on OpenAI’s price list. Recompute from your own traces, then confirm current rates on the OpenAI API pricing page before you budget.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Our playbook: let GPT-5.4 or Claude Sonnet 4.6 solve the hard version once; M0 distills traces into Experience + Skill; then route repeat work to GPT-4o (or cheaper) with higher pass rates, fewer steps, and a smaller bill than the old naive rerun.&lt;/p&gt;

&lt;p&gt;The production takeaway is obvious: in a typical agent product, most requests are repetitive patterns. You do not need the most expensive model on every call — either let a strong model teach the task once, or have a human guide a weaker model through one clean run — and then later runs can finish on their own, grounded in distilled experience.&lt;/p&gt;

&lt;p&gt;Beyond one user’s workspace: once an Experience picks up enough positive feedback, M0 can publish it to a shared space where any other M0-connected agent can retrieve it — your solved mistakes stop being only yours. M0’s vector dedupe folds overlapping discoveries together, contributor metadata accrues, and that crowd knowledge is meant to grow out of distillation itself — not through a separate manual editorial pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  One-sentence install
&lt;/h2&gt;

&lt;p&gt;OpenClaw is built around the idea that the assistant should do the heavy lifting, not a human babysitting every step — and seekdb M0’s install path is written the same way: you send your OpenClaw assistant a single line, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read https://m0.seekdb.ai/SKILL.md and install and configure M0 per the instructions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, the agent is expected to check the installed OpenClaw version, obtain an Access Key, install the m0 plugin, apply the openclaw.json / gateway settings in one shot, and restart the gateway—without you clicking through a setup wizard.&lt;/p&gt;

&lt;p&gt;Humans can still sanity-check the service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# health check&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://m0.seekdb.ai/health

&lt;span class="c"&gt;# create a memory instance&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://m0.seekdb.ai/api/instances/ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"name": "my-memory"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned ak field is your Access Key for authenticated memory operations.&lt;/p&gt;

&lt;p&gt;Try it: wire up M0, then tell your OpenClaw agent a handful of real details about you — seekdb M0 will usually auto-extract about five or six facts, run them through the memory-decision step, and persist them in the cloud. On later chats it should pull your technical preferences back in instead of cold-starting the interview from zero.&lt;/p&gt;

&lt;p&gt;At that point it already knows who you are — so you should not have to spend tokens re-introducing yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;So why does OpenClaw token usage spike? Because the default memory path leans on MEMORY.md full-load plus reactive compaction and file-scattered recall. The prompt gets crowded; history gets summarized away; OpenClaw’s agent may not even know what to search for. You pay for remembering, you pay again for forgetting, and you pay a third time for re-discovery.&lt;/p&gt;

&lt;p&gt;M0’s bet is simpler to state than it is to build:&lt;/p&gt;

&lt;p&gt;Free memory from the always-on context — store independently, retrieve on relevance, persist across sessions.&lt;/p&gt;

&lt;p&gt;More crucially: distill execution into reusable Experience + Skill, then retrieve sharply — M0-style high-precision recall beats padding the prompt with maybe relevant bulk.&lt;/p&gt;

&lt;p&gt;Our AppWorld comparison is the punchline: same model, same tasks, swap the knowledge system, and you move from 2.56M → 1.74M tokens while pass rate climbs ~15 pp in our reported setup.&lt;/p&gt;

&lt;p&gt;Spend tokens on thinking — not on re-learning what you already solved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;OpenClaw: &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;https://openclaw.ai/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seekdb M0: &lt;a href="https://m0.seekdb.ai/" rel="noopener noreferrer"&gt;https://m0.seekdb.ai/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PowerMem (open source): &lt;a href="https://github.com/oceanbase/powermem" rel="noopener noreferrer"&gt;https://github.com/oceanbase/powermem&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AppWorld: &lt;a href="https://appworld.dev/" rel="noopener noreferrer"&gt;https://appworld.dev/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seekdb D0: &lt;a href="https://d0.seekdb.ai/" rel="noopener noreferrer"&gt;https://d0.seekdb.ai/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Existing M0 users: this upgrade applies automatically — Experience and Skill records accumulate in M0 during normal agent use, with no extra configuration.&lt;/p&gt;

&lt;p&gt;New users: send the one-liner install prompt to your OpenClaw agent and let it walk the setup.&lt;/p&gt;

&lt;p&gt;The first time you pay tuition on a mistake, you should not have to pay full tuition again.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openclaw</category>
      <category>agents</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>Why Your Vector Database Benchmark Is Wrong for AI Agents</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 28 Jun 2026 13:42:29 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/why-your-vector-database-benchmark-is-wrong-for-ai-agents-53b1</link>
      <guid>https://dev.to/oceandata4ai/why-your-vector-database-benchmark-is-wrong-for-ai-agents-53b1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4c3lreu08yzgwp8cknpr.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4c3lreu08yzgwp8cknpr.webp" alt=" " width="800" height="552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Most vector database benchmarks (ann-benchmarks, vendor pages) test bulk-load + read-only — but AI agents actually run streaming workloads: concurrent writes and reads at millisecond latency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Under streaming load, P99 latency under concurrency — not QPS or serial latency — determines whether your agent’s SLA holds. Across 6 vector databases, P99 jitter ranged from 1.1× to 10.3× when concurrency was added.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The root cause is architectural: most engines accumulate index segments under streaming writes, so concurrent queries fanout and contend on CPU. We rebuilt seekdb v1.3.0 around two fixed indexes (delta + snapshot HNSW) to avoid this — and saw 22× QPS and 19× P99 improvement over our own v1.2.0.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re picking a vector database for an AI agent, you’re probably looking at &lt;a href="https://github.com/erikbern/ann-benchmarks" rel="noopener noreferrer"&gt;ann-benchmarks&lt;/a&gt; or vendor performance pages. Those benchmarks all run the same shape of test: bulk-load the dataset, build the index, then run read-only queries.&lt;/p&gt;

&lt;p&gt;That is not what an agent does.&lt;/p&gt;

&lt;p&gt;An agent’s real workload looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;observation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# continuous writes
&lt;/span&gt;    &lt;span class="n"&gt;relevant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# millisecond-later reads
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Writes and reads happen together. They happen concurrently. The interval between them is milliseconds, not minutes. This pattern has a name — &lt;em&gt;streaming workload&lt;/em&gt; — and &lt;a href="https://github.com/zilliztech/VectorDBBench" rel="noopener noreferrer"&gt;VectorDBBench&lt;/a&gt; has a test case designed specifically for it: StreamingPerformanceCase. Sustained writes at a fixed rate plus concurrent queries. The same shape your agent runs in production.&lt;/p&gt;

&lt;p&gt;VectorDBBench is maintained by Zilliz (the company behind Milvus), so it’s a third-party open-source benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Metric Everyone Skips: How Much Does Your P99 Move Under Concurrency?
&lt;/h2&gt;

&lt;p&gt;Test setup: Cohere 10M dataset (768-dim), 16 vCPU / 64 GiB, identical HNSW index parameters across all systems (M=16, ef_construction=256, ef_search=200), sustained write rate of 500 rows/sec.&lt;/p&gt;

&lt;p&gt;A note on the two seekdb rows. You’ll notice seekdb appears twice in the chart — v1.2.0 and v1.3.0. v1.3.0 is the latest release, and it’s a deliberate architectural rewrite specifically targeting streaming workloads (we’ll explain what changed in the next section). We kept v1.2.0 in the comparison on purpose: it lets you see, on the same hardware and the same dataset, how a conventional vector-database design behaves under streaming load versus the redesigned one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fl9w9yjob9dcv99x813pn.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fl9w9yjob9dcv99x813pn.webp" alt=" " width="800" height="871"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most people look at benchmark charts and read off two numbers: QPS and serial latency. But your agent doesn’t run single-threaded in production. What actually determines whether your SLA holds is concurrent P99 — and how much it inflates when you add concurrency.&lt;/p&gt;

&lt;p&gt;Look at the “P99 Jitter” group in the chart:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Elasticsearch: 10.3× — Serial P99 of 5.2ms (faster than seekdb on the cold path), but the moment you add concurrency it climbs to 53.6ms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Milvus: 9.7× — Serial 15.9ms, concurrent 153.6ms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seekdb: 1.1× — From 19.7ms to 21.7ms. Barely moves.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a tuning problem. It’s an architecture problem. The next section explains why.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Full benchmark scripts and configs: github.com/oceanbase/vdb-streambench. PRs welcome to add more systems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why Streaming Workloads Blow Up P99
&lt;/h2&gt;

&lt;p&gt;Milvus, Elasticsearch, and Qdrant all perform well in the workloads they were designed for: bulk ingestion followed by read-only queries. They were built around that shape, and they are good at it.&lt;/p&gt;

&lt;p&gt;But streaming writes expose a structural assumption baked into all of them: every batch of new data produces a new index segment. At query time, the engine has to fan the request out to N segments, run a k-NN search against each one, and merge the results. With a single query thread that’s manageable. But once you run M concurrent query threads against N segments, you have N×M units of work contending for CPU, and P99 explodes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most vector databases let their segment count grow with streaming writes, so concurrent queries fight harder and harder for CPU. seekdb keeps the segment count fixed at exactly two — so it doesn’t.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Concretely, seekdb v1.3.0 introduced two mechanisms specifically for streaming:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The write path never touches the index. When a transaction commits, all that happens synchronously is a write to the redo log. A separate Change Stream pipeline asynchronously consumes the redo log in the background and applies vectors to an in-memory delta HNSW index. Writes and index construction are physically decoupled — writes never block on indexing, and indexing never blocks on writes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The query path always hits exactly two indexes. seekdb maintains a delta HNSW (the incremental layer that absorbs new writes) and a snapshot HNSW (the steady-state main index), modeled after the LSM-tree pattern from KV stores. A query runs k-NN against both indexes and merges the result. The number of indexes is fixed regardless of how much data you’ve written, so concurrent queries don’t contend on a growing fanout.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We learned this the hard way. The seekdb v1.2.0 row in the chart — 69 QPS, concurrent P99 of 410ms — is what we shipped before the rewrite. The old write path built indexes synchronously, so we hit exactly the architectural problem described above. The 22× QPS and 19× latency improvement in v1.3.0 came entirely from these two changes. Same product, same dataset, same hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents Need More Than Speed — They Need an Undo Button
&lt;/h2&gt;

&lt;p&gt;Performance is one half of the agent problem. The other half is something most benchmarks don’t even try to measure: agents need to make speculative changes to their data.&lt;/p&gt;

&lt;p&gt;An agent might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Update memory with a hypothesis it isn’t sure about yet&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run an A/B experiment on its own state&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Try a tool call that could write garbage rows&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don’t want any of that touching production state directly. You need a sandbox, and you need a clean way to roll back.&lt;/p&gt;

&lt;p&gt;Most vector databases don’t have a primitive for this. seekdb implements Copy-on-Write directly in the storage engine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Snapshot in seconds, no data copy&lt;/span&gt;
&lt;span class="n"&gt;FORK&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;agent_state&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sandbox_42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Agent does whatever it wants in the sandbox&lt;/span&gt;
&lt;span class="n"&gt;USE&lt;/span&gt; &lt;span class="n"&gt;sandbox_42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0.1,...]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'new observation'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Speculation succeeded → merge back&lt;/span&gt;
&lt;span class="n"&gt;MERGE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sandbox_42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;agent_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;
       &lt;span class="n"&gt;STRATEGY&lt;/span&gt; &lt;span class="n"&gt;THEIRS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Speculation failed → drop it, mainline is unaffected&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;sandbox_42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is kernel-level COW, not application-layer snapshot/restore. The fork is instant, no data is copied, and each sandbox is a fully writable database — schemas, vector indexes, auto-increment columns all behave normally. Three conflict resolution strategies (FAIL, THEIRS, OURS) let you decide exactly how much of an agent's writes you trust. Both FORK DATABASE and FORK TABLE are supported, so you can branch at whichever granularity matches your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid Retrieval in a Single SQL Query
&lt;/h2&gt;

&lt;p&gt;Agent retrieval is rarely pure vector similarity. You usually want to combine vector distance with structured filters and sometimes full-text matching — show me the top 10 documents authored by user 42 since January, that match “quarterly report”, ranked by similarity to this embedding.&lt;/p&gt;

&lt;p&gt;In seekdb, that’s one SQL statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l2_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'[0.12,0.34,...]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AGAINST&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'quarterly report'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;author_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vector, full-text, and scalar filters are pushed down into a single execution plan — no client-side merging of multiple round-trip results. Full MySQL wire protocol compatibility means LangChain, LlamaIndex, Dify, and any MySQL client connect with no adapter.&lt;/p&gt;

&lt;h2&gt;
  
  
  30 Seconds to Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; pyseekdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pyseekdb&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pyseekdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./agent_state.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_or_create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episodic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Round 1: write agent observations
&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user prefers dark mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user speaks English and Chinese&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user timezone is UTC+8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;refresh_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ui preferences?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# -&amp;gt; [['user prefers dark mode']]
# Round 2: write a new observation, refresh, query immediately
&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
              &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user saw pricing page 3 times today&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;refresh_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;purchase intent signals&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# -&amp;gt; [['user saw pricing page 3 times today']]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No server, no schema migration, embedded mode runs in-process. Writes go through the same async indexing pipeline as the server build, so when you need a write to be queryable immediately, call refresh_index() once. Switching to server or distributed mode is a one-line connection-string change. There's also a &lt;a href="https://d0.seekdb.ai/" rel="noopener noreferrer"&gt;hosted Cloud trial&lt;/a&gt; — no signup, free for 7 days, single curl command.&lt;/p&gt;

&lt;h2&gt;
  
  
  About seekdb
&lt;/h2&gt;

&lt;p&gt;seekdb is fully open source under Apache 2.0, built by the &lt;a href="https://en.oceanbase.com/" rel="noopener noreferrer"&gt;OceanBase&lt;/a&gt; team. You’re probably already running on OceanBase indirectly — it’s in production at Alipay, Taobao, DiDi, and Xiaomi, among others. seekdb inherits the same storage engine and SQL executor, focused specifically on the hybrid vector + relational workloads that agents need. Six months in, the project has 2,500+ GitHub stars and is integrated with LangChain, LlamaIndex, Dify, and Coze.&lt;/p&gt;

&lt;p&gt;If you’re picking a database for an agent, take 30 seconds and run the demo above. If your current vector database has a StreamingPerformanceCase number you're proud of, we'd love to add it to the comparison.&lt;/p&gt;

&lt;p&gt;⭐ github.com/oceanbase/seekdb — a star helps more people find the project, and gives us reason to keep investing in it.&lt;/p&gt;

&lt;p&gt;Questions or want to discuss your agent workload: &lt;a href="https://github.com/oceanbase/seekdb/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt; · &lt;a href="https://github.com/oceanbase/seekdb/discussions" rel="noopener noreferrer"&gt;GitHub Discussions&lt;/a&gt;&lt;/p&gt;

</description>
      <category>bigdata</category>
      <category>benchmark</category>
      <category>database</category>
    </item>
    <item>
      <title>Why an All-in-One Data Foundation Matters: Harness, Tape, and a Database-Native Path</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 28 Jun 2026 13:31:22 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/why-an-all-in-one-data-foundation-matters-harness-tape-and-a-database-native-path-3o9j</link>
      <guid>https://dev.to/oceandata4ai/why-an-all-in-one-data-foundation-matters-harness-tape-and-a-database-native-path-3o9j</guid>
      <description>&lt;p&gt;&lt;em&gt;From model + Harness to Tape and one data foundation — why agent runtime data should live in the database from the first line of code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhsyqni8ssi8k4n2nu4ns.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhsyqni8ssi8k4n2nu4ns.webp" alt=" " width="800" height="600"&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/.within.website?redir=%2F%40brett_jordan" rel="noopener noreferrer"&gt;Brett Jordan&lt;/a&gt; on &lt;a href="https://unsplash.com/" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The agent race is quietly shifting from the model layer to the data layer. When agents run, they produce vast volumes of semi-structured trace data — high-frequency writes, long lifecycles — and traditional database architectures struggle to keep up. Shuttling that data back and forth across observability platforms, vector stores, and caches further erodes the efficiency of the record → distill → feed back loop.&lt;/p&gt;

&lt;p&gt;This reveals a critical divide:&lt;/p&gt;

&lt;p&gt;Building a database downward from an agent framework is not the same as extending a mature database upward to connect with agent frameworks. The starting points differ, and so do the cost structures.&lt;/p&gt;

&lt;p&gt;On the latter path, data is a first-class citizen from the first line of code. Run, record, distill, evaluate, and feed back — all within one foundation, without the overhead of cross-system data movement.&lt;/p&gt;

&lt;p&gt;That is where an all-in-one data foundation matters: it turns the agent data loop into an internal cycle, not a fragmented patchwork of engineering pieces.&lt;/p&gt;

&lt;p&gt;Starting from the definition of Harness, this article draws on the open-source Bub project, discusses layered agent architecture, and lands on a database-native Harness approach — including OceanBase’s exploration and value in this space.&lt;/p&gt;

&lt;h2&gt;
  
  
  I. Understanding Agents, Harnesses, and How They Relate
&lt;/h2&gt;

&lt;p&gt;A complete agent can be expressed as model + Harness.&lt;/p&gt;

&lt;p&gt;Harness covers every engineering component outside the model. Like tack on a horse, Harness is the full toolkit for steering a model toward its destination — reins, saddle, route — which at the engineering level maps to feedback mechanisms, logging systems, and training methods.&lt;/p&gt;

&lt;p&gt;A harness has a clear layered structure. Layer 1 is provided by the coding-agent builder or SDK vendor, including base tools and external interfaces. Layer 2 is extended on the user side with the components they need — business logic such as RAG systems, memory systems, and BI pipelines.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fiq4qnxcp9z3qc1ywk7nm.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fiq4qnxcp9z3qc1ywk7nm.webp" alt=" " width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In agent scenarios, the model itself is not a continuously stateful system — it returns responses to requests without awareness of concrete business state. What lets agents work reliably in products and teams are the context management, tool invocation, state recording, run-trace tracking, effectiveness evaluation, and data flow responsibilities that Harness takes on.&lt;/p&gt;

&lt;p&gt;In this process, we gradually identify and abstract key elements, defined as primitives (Primitive). System prompts, Skills, task-completion methodologies, and multi-agent communication mechanisms are all important primitives that emerge from practice. Standardizing these primitives and incorporating them into Harness improves business performance and expands capability on one hand, and gradually productizes Harness itself on the other.&lt;/p&gt;

&lt;p&gt;Data collected from Harness is equally vital. It evaluates workflow effectiveness and, after de-identification, can form standard datasets for training the next generation of models. As models improve, they feed back into primitive discovery and refinement within Harness — even correcting past behavior — forming a continuously improving flywheel. The diagram below (from &lt;a href="https://www.langchain.com/blog/the-anatomy-of-an-agent-harness" rel="noopener noreferrer"&gt;the LangChain blog&lt;/a&gt;) illustrates this loop clearly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbzx3zdc7fecoepmy2be2.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbzx3zdc7fecoepmy2be2.webp" alt=" " width="799" height="515"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  II. Building Extensible Agents: The Bub Project
&lt;/h2&gt;

&lt;p&gt;Bub is an open-source Python agent project on GitHub. Its design reflects a key approach to controlling agent complexity: balancing stability and flexibility through a lean kernel and plugin-based extension.&lt;/p&gt;

&lt;p&gt;Mainstream agent products such as ChatGPT, Qwen (Alibaba’s conversational AI), ModelScope services, and low-code platforms like Dify and Flowise already ship a built-in agent loop. A core problem remains: agent capability must match the business scenario precisely. Skills and tools can extend capability, but to complete tasks efficiently you still need to assemble a toolset for each specific scenario.&lt;/p&gt;

&lt;p&gt;Products such as OpenClaw, Nanobot, and Hermes Agent bundle too many features together. That creates two problems: feature interference and cognitive load for users; for developers, high system complexity and difficult maintenance (for example, OpenClaw upgrades often break many features across the product). Such tightly coupled designs are hard to use as-is in production. Many vendors therefore repackage a specific version or build entirely in-house.&lt;/p&gt;

&lt;p&gt;Bub takes a different architecture strategy: build a lightweight kernel and extend capabilities through plugins. Extra functionality is separated into plugins. Only a carefully designed lean kernel is maintained to implement a stable agent loop; required business capabilities are introduced step by step through feature plugins. Users need only verify that plugins are working correctly. If a plugin fails, they remove it and restore service — greatly improving maintainability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fars7gx1xwqzd3ubf6hj7.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fars7gx1xwqzd3ubf6hj7.webp" alt=" " width="800" height="660"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bub’s core design philosophy is not about how powerful a single agent is, but about how stages are divided within a single interaction. Whether it is Bub’s built-in agent or externally integrated Codex or LangChain, either can get the work done. Bub breaks each interaction into explicit stages — conversation state construction, prompt assembly, channel input/output definitions, and more. This staged breakdown makes flow control possible: hooks expose integration points at each stage, rather than piling all logic into a single agent.&lt;/p&gt;

&lt;p&gt;A key design is removing mandatory binding on output. Traditional systems bind message replies strictly to the input channel. Bub allows an agent to stay silent in certain scenarios — returning no message. That looks like a flaw in a personal-assistant setting, but in multi-user or multi-agent collaboration, silence that avoids noise is a friendly trait.&lt;/p&gt;

&lt;p&gt;The community is now seeing a wave of approaches that standardize and modularize agent design, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Agents.md — inject system- and task-related prompts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Skills — package general SOPs (documentation, code review) as distributable assets without hard-coding them into the agent loop.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MCP (Model Context Protocol) — through plugins, provide IM channel adapters, scheduled tasks, AG-UI visualization, and more.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the direction mainstream agent frameworks are moving in 2026. Bub is a practice of this idea: with only a few hundred lines of core interface code, it builds flexible infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  III. From Context to Data Loop: Tape and Database-Native Harness
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Building the data loop around Tape
&lt;/h3&gt;

&lt;p&gt;Tape (a core concept in Bub and in AgentSeek, which we are building) is not simple chat history. In some ways it resembles a trace, recording key facts from a single agent run.&lt;/p&gt;

&lt;p&gt;Unlike traces in OpenTelemetry and similar observability systems, Tape’s view is simpler — related, but not overly focused on detail. Its distinctive value lies in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Both observability data and a context model — Tape carries observability for critical tasks and serves as the agent’s runtime context model. That means humans and AI can collaborate on the same data view. The agent can read its own Tape to review past behavior.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enabling agent self-reflection and diagnosis — Traditionally, when an agent fails, engineers troubleshoot through an observability platform. With Tape, users can talk directly to the agent and ask, “Why did that fail just now?” Engineering investigation also becomes a natural conversation with the agent, because root-cause information is already built into its context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Supporting automated evaluation and analysis — From Tape records, an agent can compare different models, or the same model across different tasks, enabling automated comparative evaluation without relying on human-facing dashboards.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Serving model training — Through de-identified, formatted export, Tape can readily become task-specific datasets for model training and fine-tuning — truly connecting context and observability to model training in a closed data loop.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Why a database-native Harness is needed
&lt;/h3&gt;

&lt;p&gt;Agent systems such as OpenClaw rely heavily on the filesystem (such as various .md files). That is friendly for humans and agents to read, but poor for data processing, analysis, and handling. Modern context engineering needs a Memory layer above raw task trajectories—both a summary of the trajectory and an index. Plugins such as lossless-claw in the OpenClaw community later began using databases like SQLite to connect call chains and memory, which shows databases are necessary in this layer.&lt;/p&gt;

&lt;p&gt;Using a database as the foundation of Harness means all agent runtime data is natively a first-class citizen in the database. Observability, data extraction, and archival analysis can use native database capabilities without maintaining a complex heterogeneous data stack (such as MySQL + Elasticsearch + Redis). That provides a unified data foundation, simplifies architecture, and lowers operational cost.&lt;/p&gt;

&lt;p&gt;OceanBase is a strong fit for this path. Why? Its core strengths include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;AI workload readiness — OceanBase and its derived tools provide vector search and hybrid retrieval optimized for AI agent workloads. SQL together with vector and full-text search are built-in capabilities, without maintaining multiple technology stacks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HTAP capability — As a hybrid transactional/analytical processing database, it directly supports real-time queries and complex analysis on agent runtime data, supporting the data loop.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unified storage with seamless scale-out — All kinds of data can be stored uniformly, supporting trace analysis, retrieval, and related workloads. From single-node deployment on the edge (such as OceanBase seekdb) it scales seamlessly to a distributed OceanBase cluster, offering a smooth upgrade path as the business grows.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  IV. AgentSeek: Exploring Database-Native Harness
&lt;/h2&gt;

&lt;p&gt;Through ongoing exploration of agent architecture, the OceanBase team is building AgentSeek — a Harness built entirely on database-native capabilities.&lt;/p&gt;

&lt;p&gt;AgentSeek’s core idea: make agent runtime data a first-class database citizen from day one, helping users build data-loop scenarios. The project integrates OceanBase product capabilities with AgentSeek-related wrappers and is actively progressing.&lt;/p&gt;

&lt;p&gt;Repository: github.com/ob-labs/agentseek&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;From the layered definition of Harness, to Bub’s plugin-based extensible architecture, to Tape’s integration of observability and context, to the database-native Harness technical path — agent infrastructure is evolving from feature stacking toward data-driven design. OceanBase’s work in this space is both a natural extension of its technical architecture and a response to data-foundation needs in the AI era.&lt;/p&gt;

&lt;p&gt;If you are building data-intensive agents today: where does runtime data land in your stack, and what still breaks when you try to close the loop? Share your setup in the comments — or dig into Bub and AgentSeek, and join Data4AI on LinkedIn to compare notes with other Data + AI practitioners.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bub: &lt;a href="https://github.com/bubbuild/bub" rel="noopener noreferrer"&gt;https://github.com/bubbuild/bub&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AgentSeek: &lt;a href="https://github.com/ob-labs/agentseek" rel="noopener noreferrer"&gt;https://github.com/ob-labs/agentseek&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LangChain — The Anatomy of an Agent Harness: &lt;a href="https://www.langchain.com/blog/the-anatomy-of-an-agent-harness" rel="noopener noreferrer"&gt;https://www.langchain.com/blog/the-anatomy-of-an-agent-harness&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>dataengineering</category>
      <category>machinelearning</category>
      <category>database</category>
    </item>
    <item>
      <title>How Agent Memory Forgets: An Engineering Walkthrough</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 28 Jun 2026 13:14:31 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/how-agent-memory-forgets-an-engineering-walkthrough-mn7</link>
      <guid>https://dev.to/oceandata4ai/how-agent-memory-forgets-an-engineering-walkthrough-mn7</guid>
      <description>&lt;p&gt;&lt;em&gt;Time coats memory in dust. Access is the only cloth that wipes it clean. When the dust grows thick and no one asks, that is forgetting.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4y5b3qajm1lqtoxeo5h8.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4y5b3qajm1lqtoxeo5h8.webp" alt=" " width="800" height="533"&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@markuswinkler" rel="noopener noreferrer"&gt;Markus Winkler&lt;/a&gt; on &lt;a href="https://unsplash.com/" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/oug/from-neurons-to-code-the-forgetting-design-behind-powermem-11n9"&gt;the previous post&lt;/a&gt;, we looked at forgetting from a cognitive-science angle — synaptic plasticity, the Ebbinghaus forgetting curve [1], spaced repetition, and desirable difficulty. Fascinating theory.&lt;/p&gt;

&lt;p&gt;This piece takes a different angle. We follow a single message through an agent memory system — using PowerMem &lt;a href="https://dev.toOceanBase%E2%80%99s%20open-source%20agent%20memory%20framework"&gt;2&lt;/a&gt; as our running example — from write to eviction, and watch how forgetting is engineered at every step. In short: decay starts the moment a message is written, but what actually decides a memory’s fate is whether it gets accessed again.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvrd2i96v30xt1qnb4786.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvrd2i96v30xt1qnb4786.webp" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Importance Scoring: Is This Worth Remembering?
&lt;/h2&gt;

&lt;p&gt;Picture this message landing in the system:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Next Friday at 3:00 PM — Q2 requirements review with the product team in Conference Room B.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once the message enters memory, the first question is not how to store it, but whether it deserves storage. If every piece of information is written and retrieved at equal weight, two problems compound as volume grows: retrieval signal-to-noise keeps dropping, and storage cost becomes unbounded.&lt;/p&gt;

&lt;p&gt;Shannon’s information theory frames the same idea: high-probability events carry almost no information and are poor candidates for long-term retention; low-probability but critical events carry enormous information and should be persisted.&lt;/p&gt;

&lt;p&gt;Importance scoring is the filter. Each item gets a score that drives decay speed, review cadence, and eviction priority downstream.&lt;/p&gt;

&lt;p&gt;So how is importance assessed?&lt;/p&gt;

&lt;h3&gt;
  
  
  1.1 The six-dimension model
&lt;/h3&gt;

&lt;p&gt;In PowerMem, scoring a single message is more involved than it looks. The system uses a six-dimension model — six axes, weighted and summed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fb5yq6veqe21vm7f34pic.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fb5yq6veqe21vm7f34pic.webp" alt=" " width="748" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For our meeting message: highly relevant to “Q2 work” (relevance ≈ 0.8), concrete time and place (factual ≈ 0.8), attendance is non-optional (actionable ≈ 0.9), emotionally neutral (emotional_impact ≈ 0.2). Weighted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.3×0.8 + 0.2×0.5 + 0.15×0.2 + 0.15×0.9 + 0.1×0.8 + 0.1×0.6 ≈ 0.72
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Importance score: 0.72.&lt;/p&gt;

&lt;p&gt;The six-dimension model is the theoretical backbone of importance scoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.2 Dual-path scoring: LLM vs. rule engine
&lt;/h3&gt;

&lt;p&gt;PowerMem runs two execution paths so the system always returns a score. Path one uses the six-dimension model; path two is a rule engine that kicks in when path one is unavailable — graceful degradation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path one: LLM deep scoring (preferred)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When an LLM is available, the system asks it to analyze all six dimensions and return structured JSON, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"importance_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.72&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Meeting schedule imposes a hard time constraint and clear action requirement"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"criteria_scores"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"relevance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"novelty"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"emotional_impact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"actionable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"factual"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"personal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PowerMem reads importance_score from the response.&lt;/p&gt;

&lt;p&gt;Because LLM output is not perfectly predictable, parsing uses a &lt;strong&gt;three-level fallback&lt;/strong&gt;: try JSON first; if that fails, regex-extract a numeric score; if that fails too, default to 0.5.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Engineering note&lt;/strong&gt;: The per-dimension scores in the JSON are not fed back into the weighted formula. They exist to structure the model’s reasoning (chain-of-thought style) so the final score is more stable.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path two: Rule engine (fallback)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the LLM is down, the rule engine takes over. Precision drops, but the system keeps running.&lt;/p&gt;

&lt;p&gt;Rules accumulate score from quantifiable signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Content length &amp;gt; 100 characters: +0.1; &amp;gt; 50 characters: +0.05&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keyword hit: +0.1 each&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Contains ? or !: +0.05 each&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Metadata priority high / medium: +0.2 / +0.1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Score capped at 1.0&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is &lt;strong&gt;graceful degradation&lt;/strong&gt; in production: one external dependency failing should not stall the entire memory layer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fr98wjrc3fematcv86jip.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fr98wjrc3fematcv86jip.webp" alt=" " width="800" height="772"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Classification and Parameter Initialization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Three-layer memory model
&lt;/h3&gt;

&lt;p&gt;Once importance is set, the next step is classification: which memory layer does this message belong to?&lt;/p&gt;

&lt;p&gt;If you read the previous post, you’ll recall how biological memory layers work: information enters the hippocampus (short-term buffer) with limited capacity, consolidates into neocortex (long-term storage), and only items that are repeatedly activated, richly linked to existing knowledge, or emotionally salient earn priority transfer.&lt;/p&gt;

&lt;p&gt;PowerMem maps that to three layers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcu1nl7aygfa0hbiobx8x.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcu1nl7aygfa0hbiobx8x.webp" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F8ypcx143e26a9z27uvsz.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F8ypcx143e26a9z27uvsz.webp" alt=" " width="783" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Classification thresholds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;score ≥ 0.8 → long_term&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;score ≥ 0.6 → short_term&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;otherwise → working&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_algo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;long_term_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="c1"&gt;# 0.8
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;long_term&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_algo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;short_term_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# 0.6
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;short_term&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;working&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our meeting message scores 0.72 — above 0.6, below 0.8 — so it lands in short_term.&lt;/p&gt;

&lt;p&gt;Higher layers decay more slowly; memories live longer.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Forgetting parameter initialization
&lt;/h3&gt;

&lt;p&gt;Classification answers where a memory lives. The sharper questions are: how fast should it fade? When should it be reviewed for consolidation?&lt;/p&gt;

&lt;p&gt;After classification, PowerMem builds a full lifecycle metadata profile — a card tracking strength, decay parameters, review schedule, and management flags.&lt;/p&gt;

&lt;p&gt;The profile splits into two blocks:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0vldr53ic08gv06p415q.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0vldr53ic08gv06p415q.webp" alt=" " width="800" height="182"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below we walk through each parameter for the meeting example (importance = 0.72, short_term).&lt;/p&gt;

&lt;h4&gt;
  
  
  2.2.1 Initial retention
&lt;/h4&gt;

&lt;p&gt;Initial retention captures how “solid” a memory is at birth. More important content should start with higher retention; low-importance noise should be fragile from day one and yield bandwidth to stronger competitors.&lt;/p&gt;

&lt;p&gt;In PowerMem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;initial_retention&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initial_retention&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;importance_score&lt;/span&gt;
&lt;span class="c1"&gt;# Meeting example: 1.0 × 0.72 = 0.72
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two fields are written:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;initial_retention — snapshot at creation (“how firmly it was encoded”)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;current_retention — live effective retention&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They match at creation; only current_retention changes as decay and review proceed.&lt;/p&gt;

&lt;h4&gt;
  
  
  2.2.2 Decay rate by layer
&lt;/h4&gt;

&lt;p&gt;working / short_term / long_term mirror working memory, hippocampus, and neocortex: closer to long-term storage, slower per-unit-time forgetting. Each layer gets its own decay coefficient:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"working"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;smallest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;S&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;fades&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;fastest&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"short_term"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"long_term"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;largest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;S&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;fades&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;slowest&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For our short_term meeting message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.1 (global base decay) × 1.5 (short_term coefficient) = 0.15
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By comparison:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;working: 0.1 × 0.5 = 0.05&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;long_term: 0.1 × 2.0 = 0.20&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Larger coefficient → slower forgetting (within this parameterization).&lt;/p&gt;

&lt;h4&gt;
  
  
  2.2.3 Review scheduling
&lt;/h4&gt;

&lt;p&gt;Spaced repetition’s core rule: review often at first, then stretch intervals. Hit the window where the memory is almost gone but still recoverable — not so early that review is wasted, not so late that it’s already lost.&lt;/p&gt;

&lt;p&gt;PowerMem schedules review timestamps at creation, not on demand.&lt;/p&gt;

&lt;p&gt;Step 1: Baseline intervals&lt;/p&gt;

&lt;p&gt;Five global baseline intervals (hours): [1, 6, 24, 72, 168] — roughly 1 h, 6 h, 1 day, 3 days, 7 days. Shared by all memories.&lt;/p&gt;

&lt;p&gt;Step 2: Compress by importance&lt;/p&gt;

&lt;p&gt;Baseline intervals treat all memories equally, but a credential reminder should be nudged more often than small talk. Each interval is compressed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;adjusted_interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;importance_score&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;adjustment_factor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;interval — baseline (e.g. 1 h, 6 h, 24 h)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;importance_score — 0.72 for our meeting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;adjustment_factor — default 0.3 (compression strength)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Floor: 0.5 hours&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the meeting (importance = 0.72):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 - 0.72 × 0.3 = 0.784
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each baseline interval becomes 78.4% of its original length:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F9oonuty5e0jj6a1m8djc.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F9oonuty5e0jj6a1m8djc.webp" alt=" " width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a low-importance message (importance = 0.3):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 - 0.3 × 0.3 = 0.91
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Intervals shrink only slightly — reviews drift later:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fq7kft2921wvzu5fe7j8a.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fq7kft2921wvzu5fe7j8a.webp" alt=" " width="800" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Higher importance → earlier review windows → more chances to re-consolidate.&lt;/p&gt;

&lt;p&gt;After computing five timestamps, PowerMem stores the full schedule and initializes companion fields:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F87l7r79y1ue060fjdt9d.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F87l7r79y1ue060fjdt9d.webp" alt=" " width="800" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;next_review — when to review&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;review_count + last_reviewed — how much review has happened&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;reinforcement_factor — how much each review restores&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When next_review arrives and a review completes: review_count increments, last_reviewed updates, current_retention rises by reinforcement_factor, next_review advances.&lt;/p&gt;

&lt;p&gt;That closes the engineering loop of retrieval → reconsolidation.&lt;/p&gt;

&lt;h4&gt;
  
  
  2.2.4 Lifecycle state machine
&lt;/h4&gt;

&lt;p&gt;Beyond continuous numbers, memories need discrete lifecycle flags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Should this memory be promoted to a higher layer?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Should it be evicted?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Should it leave the active retrieval pool?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At creation, PowerMem initializes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"should_promote"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"should_forget"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"should_archive"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"is_active"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;should_promote — e.g. working → short_term when a “scratch” memory is accessed repeatedly; slower decay upstairs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;should_forget — decay factor drops below threshold (0.3), or zero accesses in 7 days (silent forgetting).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;should_archive — move out of active search. Archive ≠ delete; data remains, but skips routine retrieval.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;is_active — participates in normal read/search paths.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An access_count counter is also initialized; promotion logic can require ≥ 3 accesses, among other rules.&lt;/p&gt;

&lt;h4&gt;
  
  
  2.2.5 Persisting the metadata profile
&lt;/h4&gt;

&lt;p&gt;When all parameters are computed, the system packs them into a structured dict, merges into metadata, and writes alongside the message body.&lt;/p&gt;

&lt;p&gt;The clock starts ticking here.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Decay Calculation
&lt;/h2&gt;

&lt;p&gt;Time passes. Retention drifts downward.&lt;/p&gt;

&lt;p&gt;Recall the Ebbinghaus form: R(t) = e^(-λt) — exponential decay.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 PowerMem’s decay formula
&lt;/h3&gt;

&lt;p&gt;In code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decay_rate&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;decay_rate&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;decay_rate&lt;/span&gt;
&lt;span class="n"&gt;decay_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;hours_elapsed&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The denominator 24 × rate (hours) is the memory’s characteristic decay timescale. Call it S (Strength): S = 24 × rate. Larger S → slower decay → longer life.&lt;/p&gt;

&lt;p&gt;Cleaner form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;decay_factor = e^(-t / S)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Elegant property: after elapsed time t = S, retention falls to e^(-1) ≈ 37% of its prior value — regardless of S. Know S in hours, and you know the forgetting rhythm.&lt;/p&gt;

&lt;p&gt;For our short_term meeting memory, rate = 0.15:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S = 24 × 0.15 = 3.6 hours
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Roughly every 3.6 hours, retention drops to ~37% of what it was.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;One last engineering detail:&lt;/strong&gt; PowerMem includes a fallback path. The caller first computes and passes a memory-type-specific decay rate. If none is provided, the system falls back to the global default decay rate. This ensures decay calculation still works even when legacy metadata lacks type information.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 A telling engineering trade-off
&lt;/h3&gt;

&lt;p&gt;PowerMem’s formula is equivalent to the classic Ebbinghaus write-up — different notation. Classic: R(t) = e^(-λt) where λ is the decay constant and λ = 1/S.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Feaxezdv7o2ugx3yjynzw.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Feaxezdv7o2ugx3yjynzw.webp" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Back-solving classic Ebbinghaus lab data gives λ ≈ 0.821. PowerMem’s default config implies λ ≈ 0.417 — about half:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F2ksqdul5rurtgubpfmx5.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F2ksqdul5rurtgubpfmx5.webp" alt=" " width="751" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is intentional, not a bug. Ebbinghaus used meaningless syllables — the fastest decay humans show. PowerMem stores semantically linked agent memory; gentler decay matches that reality.&lt;/p&gt;

&lt;p&gt;Tune aggressiveness via INTELLIGENT_MEMORY_DECAY_RATE in .env:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkvmnthpn46kxdg0czpol.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkvmnthpn46kxdg0czpol.webp" alt=" " width="765" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lower decay_rate → more aggressive forgetting.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 Decay timeline for the meeting memory
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvs9o17e06h8obeozrdny.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvs9o17e06h8obeozrdny.webp" alt=" " width="764" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A short_term memory crosses the 0.3 threshold in ~4.3 hours — but that does not mean immediate deletion. Eviction runs when the memory is accessed.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Access-Triggered Lifecycle
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Defer forget decisions until access
&lt;/h3&gt;

&lt;p&gt;Decay runs continuously in the background; forget / promote / archive decisions execute on access.&lt;/p&gt;

&lt;p&gt;Cognitive background: retrieving a consolidated memory temporarily reopens it to plasticity; reconsolidation strengthens the trace. Engineering translation: memory fate should not be time-only — re-evaluate on every touch.&lt;/p&gt;

&lt;p&gt;Access is the strongest feedback signal. Frequent access → retain or promote. Long silence → forget, even if it once looked important. Lazy evaluation avoids batch scans; no cron job walking every row.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Four checkpoints
&lt;/h3&gt;

&lt;p&gt;On Memory.get() or Memory.search(), four stages run in order.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fpm392zqkanlzlsv5s0le.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fpm392zqkanlzlsv5s0le.webp" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stage 1: Forget&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_resolve_decay_rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;decay_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_decay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decay_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;decay_factor&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;working_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# 0.3
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;access_count&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;time_elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Either condition triggers forget. The second is blunt: never accessed in seven days is itself a strong forget signal. The caller performs deletion.&lt;/p&gt;

&lt;p&gt;Stage 2: Promote&lt;/p&gt;

&lt;p&gt;Any one condition promotes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;access_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time_elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;short_term_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# 0.6
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Effect: working → short_term or short_term → long_term — lower decay multiplier, longer life.&lt;/p&gt;

&lt;p&gt;Our meeting memory (importance = 0.72 ≥ 0.6) qualifies for promotion on first access. If still short_term, it upgrades to long_term — sticky note → durable knowledge. That mirrors biological consolidation.&lt;/p&gt;

&lt;p&gt;Stage 3: Archive&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time_elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;working_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# 0.3
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Archived memories are not deleted; they leave the active pool but remain reachable via archive APIs.&lt;/p&gt;

&lt;p&gt;Stage 4: Periodic reprocessing&lt;/p&gt;

&lt;p&gt;Whenever the access count hits a multiple of 5 (the 5th, 10th, 15th access, and so on), or when the memory type changes, the system recomputes all Ebbinghaus metadata. Parameters track evolving access patterns — spaced repetition stabilized in code.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Search Weighting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Interference theory in retrieval
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;The hard part of memory is not storage — it’s retrieval. As volume grows, cross-interference explodes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Search for “Q2 review.” Pure semantic ranking might surface a three-month-old meeting note at the top and bury yesterday’s schedule change. Best semantic match ≠ what the user needs right now.&lt;/p&gt;

&lt;p&gt;PowerMem injects time into ranking.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Ranking formula
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ffin4bre8lbjp0hihl8e4.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ffin4bre8lbjp0hihl8e4.webp" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;final_score = relevance_score × decay_factor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;relevance_score — keyword / semantic match&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;decay_factor — time decay from Section 3&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a cross-rank: stale-but-perfect matches lose to fresher moderate matches:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fncpxr91w7anvehthl9j8.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fncpxr91w7anvehthl9j8.webp" alt=" " width="760" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recency has veto power. No matter how well it matches, a nearly decayed memory sinks.&lt;/p&gt;

&lt;p&gt;Search is also access: each hit triggers Memory.get() — batch lifecycle management for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Global Optimization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  6.1 Why global passes matter
&lt;/h3&gt;

&lt;p&gt;Everything above is per-memory, online. At scale you still need periodic housekeeping — duplicates, redundancy, fragmentation. Analog: sleep-dependent consolidation — replay, transfer, dedupe, merge, strengthen important links.&lt;/p&gt;

&lt;p&gt;PowerMem offers three complementary strategies.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.2 Three optimization strategies
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fj8phk28zmlcphwofutjg.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fj8phk28zmlcphwofutjg.webp" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(1) Exact deduplication&lt;/p&gt;

&lt;p&gt;Content-hash exact match. Maintain hash → [memories]; keep earliest per group, delete rest. Batch cap: 10,000 records. Identical duplicates → one row.&lt;/p&gt;

&lt;p&gt;(2) Semantic deduplication&lt;/p&gt;

&lt;p&gt;Embedding cosine similarity finds near-duplicates with different wording. O(N×M) pairwise compare; default threshold 0.95. Delete newer duplicate; keep earliest.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;“Q2 review moved to next Wednesday”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Q2 review pushed to Wednesday next week”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same meaning, different surface form — semantic dedup catches it.&lt;/p&gt;

&lt;p&gt;(3) Memory compression&lt;/p&gt;

&lt;p&gt;For similar-but-not-identical clusters, an LLM summarizes into one synthetic memory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Greedy clustering — group pairs above threshold (default 0.85)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LLM summary — templated prompt → one replacement memory per cluster&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together with per-item decay, this spans micro (row-level fade) → macro (batch compress) — a full memory quality system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Quite a bit of ground. Our meeting message’s path through PowerMem:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4ldvx1m0syaycqyo37a2.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4ldvx1m0syaycqyo37a2.webp" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Q2 requirements review, Friday 3 PM, Conference Room B" enters the system
  ↓
Importance scoring → "How important?" → 0.72
  ↓
Classification → "Which layer?" → short_term
  ↓
Parameter init → "Set the decay clock" → decay rate + review schedule locked in
  ↓
Time passes → retention falls
  ↓
Accessed → "Still alive? Promote?" → passes; promoted to long_term
  ↓
Searched → "Where in results?" → recency × relevance
  ↓
Global optimization → "Duplicates? Compress?" → dedup and merge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The through-line: use finite storage for the highest-value signal, and keep retrieval SNR high.&lt;/p&gt;

&lt;p&gt;Six mechanisms, six jobs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Importance scoring — what to remember&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Classification — how long to remember&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decay — when to evict&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Access triggers — dynamic adjustment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Search weighting — how to find it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Global optimization — how to stay lean&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What we find most interesting about PowerMem: forgetting is not a binary “delete or not” afterthought. Decay begins at write time — but decay is a continuous weight, not deletion. What actually decides fate is whether the memory is touched again. That’s uncomfortably close to how human memory behaves.&lt;/p&gt;

&lt;p&gt;Forgetting is not a patch on top of memory systems. It is a core design dimension across the entire lifecycle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fo7tkk19p8m5hyiulhgbp.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fo7tkk19p8m5hyiulhgbp.webp" alt=" " width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Based on PowerMem v1.1.1 source analysis; code references reflect the actual project.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;p&gt;[1] Ebbinghaus forgetting curve (PowerMem docs):&lt;br&gt;
&lt;a href="https://github.com/oceanbase/powermem/blob/main/docs/guides/0008-ebbinghaus_forgetting_curve.md" rel="noopener noreferrer"&gt;https://github.com/oceanbase/powermem/blob/main/docs/guides/0008-ebbinghaus_forgetting_curve.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] PowerMem:&lt;br&gt;
&lt;a href="https://github.com/oceanbase/powermem" rel="noopener noreferrer"&gt;https://github.com/oceanbase/powermem&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building agent memory in production? What’s biting you hardest — decay tuning, retrieval noise, or promotion rules? Share what you’ve shipped (or what broke) in the comments.&lt;/p&gt;

&lt;p&gt;👏 Clap · 🔔 Follow for more database and agent-memory engineering deep dives&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>softwareengineering</category>
      <category>opensource</category>
    </item>
    <item>
      <title>From Neurons to Code: The Forgetting Design Behind PowerMem</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 28 Jun 2026 04:09:03 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/from-neurons-to-code-the-forgetting-design-behind-powermem-11n9</link>
      <guid>https://dev.to/oceandata4ai/from-neurons-to-code-the-forgetting-design-behind-powermem-11n9</guid>
      <description>&lt;p&gt;&lt;em&gt;Brains forget on purpose. PowerMem does too — three memory tiers, exponential decay, and ranking by relevance × freshness, not just semantic match.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Foigt1xhcbn8cvfnliw6o.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Foigt1xhcbn8cvfnliw6o.webp" alt=" " width="799" height="450"&gt;&lt;/a&gt;Photo by &lt;a href="https://unsplash.com/@thefredyjacob" rel="noopener noreferrer"&gt;Fredy Jacob&lt;/a&gt; on &lt;a href="https://unsplash.com/" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;PowerMem treats forgetting as a first-class capability — not a bug — using a three-tier memory model (working, short_term, long_term) backed by Ebbinghaus-style exponential decay.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decay-rate multipliers differ by tier (×2.0 / ×1.5 / ×1.0), so unimportant memories fade quickly while frequently accessed ones are promoted and stabilized — directly mirroring synaptic plasticity and memory consolidation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retrieval ranking combines semantic similarity with a decay factor (final_score = relevance × decay), turning forgetting into a quality regulator rather than a delete switch.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If every memory carries equal weight at retrieval time, two problems compound:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Retrieval quality decays. New and old memories interfere with each other in the embedding space. As the corpus grows, the signal-to-noise ratio of any query drops.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage costs spiral. Most low-value content is never retrieved, yet it keeps consuming space, index time, and embedding budget forever.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PowerMem’s forgetting mechanism decides two things: when a memory dies, and how much weight it carries during retrieval. Before walking through the code in a follow-up post, it is worth tracing the cognitive-science principles the system is modeled on.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Nature Forgets
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Synaptic Plasticity
&lt;/h3&gt;

&lt;p&gt;The biological substrate of memory is the synaptic connection between neurons. Those connections are anything but static — two opposing mechanisms continuously modulate them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Long-Term Potentiation (LTP) — frequently used pathways are strengthened. This is the basis of remembering.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Long-Term Depression (LTD) — rarely used pathways are weakened. This is the basis of forgetting.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LTP and LTD are partners, not adversaries. If every synapse were strengthened equally, the brain would lose its ability to distinguish signal from noise. LTD selectively weakens inactive connections so that limited synaptic resources concentrate on the active pathways. Forgetting is the price memory pays for discrimination.&lt;/p&gt;

&lt;h3&gt;
  
  
  From Hippocampus to Neocortex
&lt;/h3&gt;

&lt;p&gt;A newer memory is first held in the hippocampus — high-throughput, low-capacity, much like RAM. During sleep, the brain replays these traces and gradually transfers selected ones to the neocortex for long-term storage.&lt;/p&gt;

&lt;p&gt;The transfer is selective. Only memories that are repeatedly activated, richly associated with prior knowledge, or marked by strong emotion are prioritized. Isolated, single-occurrence, emotionally neutral information falls off during the move. Nature performs filtering automatically during consolidation, and this is the direct biological blueprint for PowerMem’s three-tier model: working → short_term → long_term.&lt;/p&gt;

&lt;h3&gt;
  
  
  Forgetting Is a Retrieval Problem
&lt;/h3&gt;

&lt;p&gt;Cognitive psychology adds another lens: interference theory. Forgetting is often not about information being erased, but about it being un-retrievable. Proactive interference — old memories disrupt the recall of new ones (you keep typing your old phone number). Retroactive interference — new memories disrupt the recall of old ones (learning Spanish makes Italian vocabulary slip).&lt;/p&gt;

&lt;p&gt;The hard problem is not writing — it is reading under interference. As the store grows, cross-memory interference rises super-linearly. Decaying low-value entries reduces interference density and restores retrieval precision.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Shannon-Information View
&lt;/h2&gt;

&lt;p&gt;Claude Shannon’s 1948 definition of information quantifies surprise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I(x) = -log₂(p(x))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The information content of an event is inversely related to its probability — common events carry little information; rare events carry a lot.&lt;/p&gt;

&lt;p&gt;Mapped onto a memory system this gives a natural rule. “What I had for breakfast yesterday” (happens daily, p ≈ 1, I ≈ 0) is not worth long-term storage. “The master password for our production database” (almost never asked, tiny p, huge I) must be persisted.&lt;/p&gt;

&lt;p&gt;A well-designed forgetting mechanism is therefore an information filter: high-information content (rare but critical) is retained, low-information content (frequent but trivial) is decayed and evicted, and everything in between is interpolated smoothly. PowerMem’s tiered architecture implements this filter; the forgetting curve gives it a time-varying weight, so classification keeps evolving instead of being decided once at write time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ebbinghaus Forgetting Curve
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Memory Becomes Measurable
&lt;/h3&gt;

&lt;p&gt;In 1885, Hermann Ebbinghaus turned memory research from philosophy into laboratory science. Using roughly 2,300 nonsense syllables to avoid prior-knowledge bias, he ran a strict protocol on himself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Learn a 13-syllable list until two consecutive error-free recitations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wait 20 minutes, 1 hour, 9 hours, 1 day, 2 days, 6 days, 31 days.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Re-learn using the savings method — measure how much faster than the first time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The retention data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F1dxi1esrr8bu9n67i9re.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F1dxi1esrr8bu9n67i9re.jpg" alt=" " width="800" height="688"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two conclusions, still standing more than a century later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Forgetting is exponential, not linear — about 40% lost in the first 20 minutes, more than half within an hour, then a long slow tail.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Spaced review rewrites the curve — repeated reviews at the right interval slow subsequent decay.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  From the Original Fit to Modern Exponential Decay
&lt;/h3&gt;

&lt;p&gt;Ebbinghaus’s original fit was logarithmic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;b = 100k / ((log t)^c + k)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;with b the savings percentage, t the time in minutes, and constants k ≈ 1.84, c ≈ 1.25.&lt;/p&gt;

&lt;p&gt;Later work showed that a simpler exponential model approximates the data just as well, and it is now the standard form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;R(t) = e^(-λt)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;R(t) — retention at time t, the fraction of the original information still recallable, in [0, 1].&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;e — the natural constant (≈ 2.71828), the mathematical base for any continuous, smooth exponential process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;λ (lambda) — the decay rate. Larger λ → faster forgetting (steeper curve). Smaller λ → more durable memory (flatter curve).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;t — elapsed time since the memory was formed, typically in hours.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The graph is a fast-then-slow curve. Most of the loss happens early; whatever survives the early window is far more stable, simply because there’s not much left to forget. These equations are the mathematical foundation of PowerMem’s forgetting mechanism.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Exponential Is the Right Functional Form
&lt;/h3&gt;

&lt;p&gt;The defining feature of forgetting is that the rate of forgetting is proportional to what remains. The differential statement is dR/dt = -λR — change rate proportional to current state — and its unique solution is exactly R(t) = e^(-λt).&lt;/p&gt;

&lt;p&gt;Newton’s law of cooling, radioactive decay, capacitor discharge — apparently unrelated phenomena that share the same equation because they share the same self-consistent relationship between rate and state. Memory decay is no exception. Modern spaced-repetition systems (SuperMemo, Anki, PowerMem) converge on exponential decay because it offers the best balance between simplicity, computability, and empirical fit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spaced Repetition and Desirable Difficulty
&lt;/h3&gt;

&lt;p&gt;Ebbinghaus also discovered that spaced repetition resets the curve, and each reset slows the next decay. Neuroscience explains why through memory reconsolidation: when a consolidated memory is actively retrieved, it briefly returns to a plastic state, and the brain re-stabilizes it through a fresh round of protein synthesis and synaptic reinforcement.&lt;/p&gt;

&lt;p&gt;Reconsolidation needs time. Cramming ten repetitions into five minutes does not allow protein synthesis and synaptic remodeling to complete — the biological reason rote cramming is inefficient. Wait too long, however, and the trace has already decayed below retrieval threshold, leaving nothing to reconsolidate. Robert Bjork (UCLA, 1994) crystallized this into the concept of desirable difficulty: the most efficient learning happens when retrieval is just hard enough to trigger adaptation. This principle drives PowerMem’s review-scheduling logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  PowerMem’s Three-Tier Memory Architecture
&lt;/h2&gt;

&lt;p&gt;This is where the biology, the information theory, and the math all land in code. PowerMem is not the first system to talk about “memory tiers” — but the way it makes forgetting a tunable parameter at every layer is what makes the design worth examining in detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  From Biology to Code
&lt;/h3&gt;

&lt;p&gt;The cognitive-science principles above translate into three engineering tiers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fadrxhe878zhkzi52xfuh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fadrxhe878zhkzi52xfuh.png" alt=" " width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Classification is driven by an importance score:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;importance ≥ 0.8  →  long_term
importance ≥ 0.6  →  short_term
importance &amp;lt; 0.6  →  working
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The decay-rate multiplier is the key differentiating parameter. Over the same 24-hour window, a working memory decays at twice the rate of a long_term one. Importance directly controls expected lifespan: unimportant content disappears quickly, freeing retrieval space for the things that actually matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Global Architecture of the Forgetting Subsystem
&lt;/h2&gt;

&lt;p&gt;PowerMem’s forgetting subsystem has four cooperating components, arranged along the lifecycle of a memory entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;New input → ImportanceEvaluator
         → EbbinghausAlgorithm
         → EbbinghausIntelligencePlugin
            ├─ on_add():    inject decay parameters at creation
            ├─ on_get():    check decay / promotion / archival on access
            └─ on_search(): batch-process lifecycle during search
         → MemoryOptimizer
            ├─ exact dedup (MD5 hash)
            ├─ semantic dedup (cosine similarity)
            └─ memory compression (LLM summarization)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;ImportanceEvaluator — judges how important a piece of information is and outputs a 0.0–1.0 score.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EbbinghausAlgorithm — pure-math layer providing decay computation, review scheduling, and the forget / promote / archive decisions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EbbinghausIntelligencePlugin — injects management logic at the key lifecycle hooks: creation, access, and search.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MemoryOptimizer — periodic global pass that performs deduplication and compression.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Forgetting Is More Than Deletion
&lt;/h2&gt;

&lt;p&gt;In retrieval, the forgetting mechanism plays an equally critical role as a ranking signal. Search results are ordered by:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;final_score = relevance_score × decay_factor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;relevance_score — semantic match (vector similarity).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;decay_factor — temporal freshness (the exponential decay value).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These two parameters jointly determine the final ranking, which makes non-trivial cross-rankings possible. The numbers below are illustrative; the actual decay factor depends on the configured decay_rate:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fd9f756tiwh62mpmis8bn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fd9f756tiwh62mpmis8bn.jpg" alt=" " width="800" height="648"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Forgetting is not a simple delete switch — it is a quality regulator for retrieval. It guarantees that the result respects both the content match dimension and the time freshness dimension simultaneously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Forgetting Matters
&lt;/h2&gt;

&lt;p&gt;Pulling the threads together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Forgetting is the foundation of ranking. Decay manufactures a second axis beyond semantic similarity, so otherwise-equivalent matches can be separated cleanly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Forgetting lets memory evolve. Frequently accessed entries are promoted and assigned lower decay rates; repeated use stabilizes what is genuinely useful, exactly as reconsolidation does in the brain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Forgetting is continuous, not binary. A smooth 1.0 → 0.0 spectrum mimics how human memory actually fades, and leaves room for future features — soft deletes, memory revival, tiered archival — without breaking the model.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nature designed it this way. PowerMem translates that design into code you can configure, tune, and reason about.&lt;/p&gt;

&lt;p&gt;The next post follows a single piece of information through the full PowerMem pipeline — importance evaluation → tier assignment → decay → access trigger → promotion or forgetting → global optimization — to see exactly how the theory becomes runtime behavior.&lt;/p&gt;

&lt;p&gt;PowerMem on GitHub: &lt;a href="https://github.com/oceanbase/powermem" rel="noopener noreferrer"&gt;https://github.com/oceanbase/powermem&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you find PowerMem helpful, please give it a ⭐ on GitHub. It would be a great help to the project!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Based on PowerMem v1.1.1. All code references come from the actual project files.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>agents</category>
      <category>coding</category>
    </item>
    <item>
      <title>Technical Deep Dive: How OceanBase’s Native Column Store Powers HTAP</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 28 Jun 2026 03:34:43 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/technical-deep-dive-how-oceanbases-native-column-store-powers-htap-5d0i</link>
      <guid>https://dev.to/oceandata4ai/technical-deep-dive-how-oceanbases-native-column-store-powers-htap-5d0i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6m2lqkx7vx8hxpk9j9ep.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6m2lqkx7vx8hxpk9j9ep.webp" alt="HTAP" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This series covers how OceanBase Analytic Processing (AP) delivers strong transactional guarantees and high-concurrency support for real-time analytics. This article focuses on the native column-store engine’s architecture — tracing the full technical path from LSM-Tree’s baseline-delta separation, through adaptive compaction, columnar encoding, and Skip Index optimizations, to the vectorized execution engine 2.0, cost-model-driven row/column path selection, and system-wide adaptations across DDL, backup/restore, and transaction consistency. Together, these mechanisms show how OceanBase serves both TP and AP workloads within a single architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. From TP to HTAP: Why a Native Column Store
&lt;/h2&gt;

&lt;p&gt;When building real-time analytics, enterprises face a classic architectural trade-off: deploy a separate OLAP database, or run analytical queries directly on the OLTP system. The first approach introduces data synchronization latency and operational complexity. The second hits a performance wall — row-store engines are not designed for analytical workloads.&lt;/p&gt;

&lt;p&gt;Starting with V4.3.0, OceanBase offers a third path: a native column-store engine that delivers high-concurrency transaction processing and complex analytical queries within the same database instance — no additional data sync pipeline required. The key breakthrough is a deep integration of row store and column store within a single codebase and a single OBServer process, built on the LSM-Tree architecture.&lt;/p&gt;

&lt;p&gt;Before V4.3.0, OceanBase’s AP capability relied on a lightweight row-store-plus-index approach. It handled simple analytical queries but bottlenecked on typical AP workloads involving multi-table joins, wide-range scans, and complex aggregations. The native column store closes this gap, enabling use cases like real-time reporting, real-time data warehousing, and user profiling.&lt;/p&gt;

&lt;p&gt;This article walks through the technical details, starting with the column-store engine’s architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Architecture of the Native Column-Store Engine
&lt;/h2&gt;

&lt;p&gt;OceanBase’s native column-store engine is not a bolt-on layer over a row-store architecture. It redesigns data organization from the ground up. The core challenge: support both high-concurrency writes and efficient analytical queries in a unified architecture — balancing storage format, data flow, and query execution. This section explains the baseline-delta separation mechanism built on LSM-Tree.&lt;/p&gt;

&lt;p&gt;Traditional column-store engines offer weak transactional support and struggle with complex transaction scenarios. OceanBase’s LSM-Tree architecture handles this by separating baseline and delta data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Baseline data: Stored in columnar format, optimized for analytical query performance. When delta data accumulates to a threshold, it merges with the baseline to produce a new columnar baseline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Delta data: Stored in row format for high-concurrency transactional updates. DML operations first write to an in-memory MemTable, then flush to disk as SSTables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merge mechanism: A background process periodically merges delta data into the baseline, avoiding performance degradation from heavy random updates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture delivers row-store and column-store unification in one codebase, one architecture, and one OBServer process — serving both TP and AP query performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Key Technical Mechanisms
&lt;/h2&gt;

&lt;p&gt;With the baseline-delta separation established at the architecture level, the next challenge is engineering efficiency: how to compact columnar data efficiently, reduce storage overhead, and minimize I/O and compute costs at query time. This section covers four core mechanisms in OceanBase’s column-store engine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adaptive Compaction
&lt;/h3&gt;

&lt;p&gt;Column-store compaction is more complex than row-store compaction. Columnar data is organized by column, so merges involve more files and data reorganization, consuming significantly more resources. OceanBase addresses this with an adaptive compaction mechanism.&lt;/p&gt;

&lt;p&gt;The system intelligently selects which partitions to compact based on the volume of delta data written and query performance metrics per partition — avoiding resource overload from full-scale compaction scheduling. It borrows parallelization techniques from row-store compaction, splitting columnar merge tasks horizontally into sub-tasks for parallel execution. A more innovative feature is vertical splitting: column-level merge task scheduling that prioritizes hot or critical columns, optimizing resource allocation.&lt;/p&gt;

&lt;p&gt;V4.3.0 also introduces tablet-level compaction, supporting partition-level merges triggered from the system tenant. When users observe query performance degradation, they can manually trigger a partition-level merge for quick resolution — providing greater operational flexibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Columnar Encoding
&lt;/h3&gt;

&lt;p&gt;OceanBase V4.3.0 introduces a new columnar encoding algorithm, enabled via row_format=compressed. This encoding is deeply optimized for column-store access patterns, with full-stack optimization from low-level encoding to upper-level execution.&lt;/p&gt;

&lt;p&gt;The new algorithm leverages CPU SIMD instructions to dramatically improve parallel processing of numerical computations. It applies efficient compression algorithms to numeric columns — delta encoding, run-length encoding, and others — significantly reducing storage footprint. Queries can execute filter operations directly on compressed data, eliminating decompression overhead and further boosting query performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skip Index
&lt;/h3&gt;

&lt;p&gt;Skip Index is one of the column-store engine’s core optimization features. In analytical queries, most time is spent on I/O and full data scans. Skip Index adds pre-aggregated data at the storage layer, intelligently skipping irrelevant data blocks to drastically reduce unnecessary disk access.&lt;/p&gt;

&lt;p&gt;At the implementation level, Skip Index computes statistics at the smallest storage unit (micro-block) — including min, max, and null count — then aggregates upward layer by layer: micro-block to macro-block to SSTable, building a multi-level index structure. At query time, the system uses pre-aggregated min/max values to quickly determine whether a data block contains data within the query’s filter range, skipping large volumes of irrelevant blocks.&lt;/p&gt;

&lt;p&gt;For DDL, OceanBase provides flexible Skip Index management. Users can create different types of Skip Index on specified columns at table creation, or modify them later via ALTER TABLE. For column-store tables, the system automatically creates MIN_MAX and SUM Skip Indexes on all columns — delivering performance gains with zero additional configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhanced Pushdown
&lt;/h3&gt;

&lt;p&gt;OceanBase further enhances query pushdown, moving more computation into the storage layer. All filter conditions can now be pushed down to storage, where they combine with Skip Index pre-aggregation to perform rapid data filtering at the storage level.&lt;/p&gt;

&lt;p&gt;Aggregate function pushdown is also strengthened. count, max, min, sum, and avg can execute directly at the storage layer. For aggregation queries without GROUP BY clauses, the final result is computed entirely in the storage layer, eliminating the overhead of pulling data up to the execution layer.&lt;/p&gt;

&lt;p&gt;The most innovative enhancement is GROUP BY pushdown. For low-cardinality columns, the system uses dictionary information within micro-blocks to perform localized GROUP BY computation, significantly reducing data transfer volume. This optimization is especially effective for typical analytical scenarios like user profiling and behavioral analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. System-Wide Adaptation
&lt;/h2&gt;

&lt;p&gt;The column-store engine is not an isolated feature — its value depends on tight integration with every database module. From SQL parsing to execution plan generation, from DDL operations to backup/restore, OceanBase has systematically adapted multiple core modules since V4.3.0 to ensure column-store capabilities integrate seamlessly into existing workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  DDL Support and Table Types
&lt;/h3&gt;

&lt;p&gt;OceanBase V4.3.0 provides flexible column-store DDL support. Users can create different table types based on workload requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Column-store table: Creates a pure column-store table where all data is stored in columnar format, suited for analytics-heavy workloads.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c1&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c2&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt; &lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;each&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Row-column redundant table: Maintains both row-store and column-store copies of the data. Supports both high-concurrency transactions and efficient analytical queries, at the cost of additional storage.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c1&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c2&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt; &lt;span class="k"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;each&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Column-store index: Creates a column-store index on a row-store table to accelerate specific query patterns.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create a pure column-store index on columns c1, c2 of table t1&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;idx1&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt; &lt;span class="k"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;each&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Create a row-column redundant index on column c1 of table t1&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;idx2&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt; &lt;span class="k"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;each&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Column-store tables support the full range of DDL operations: adding columns, dropping columns, modifying column types, and more. Skip Index DDL syntax was further refined in V4.3.5, supporting online DDL for maintaining pre-aggregated data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost-Model Enhancement in the Optimizer
&lt;/h3&gt;

&lt;p&gt;Cost-based row/column path selection is a key optimization in V4.3.0. OceanBase implements a unified optimizer codebase that estimates costs differently for row-store and column-store paths, enabling automatic path selection for user queries.&lt;/p&gt;

&lt;p&gt;Cost Estimation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Storage-layer cost evaluation: The optimizer estimates I/O cost, CPU cost, and memory cost for scanning both row-store and column-store data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data characteristics: It considers the number of columns accessed, data distribution, and filter selectivity to dynamically select the optimal storage path.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hybrid path support: For complex queries, the optimizer may use both row-store and column-store paths simultaneously, achieving best performance through intelligent data reorganization.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Path Selection Strategy&lt;/p&gt;

&lt;p&gt;When the optimizer identifies an analytical operation (full table scan, multi-column aggregation, complex filtering), it prefers the column-store path. For point lookups and high-concurrency updates (TP operations), it selects the row-store path. This intelligent routing allows OceanBase to efficiently handle both TP and AP workloads within a single database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vectorized Engine 2.0
&lt;/h3&gt;

&lt;p&gt;V4.3.0 introduces a new vectorized execution engine based on the Column data format. Compared to the earlier Uniform format, the new engine offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Native columnar data support: Optimized for column-store access patterns, eliminating data format conversion overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Full SIMD utilization: More efficient data layout enables better use of modern CPU SIMD instructions for numerical computation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Memory access optimization: Improved in-memory data arrangement increases cache hit rates and memory access efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Backup/Restore and Transaction Adaptation
&lt;/h3&gt;

&lt;p&gt;OceanBase has adapted multiple modules around the column-store engine — from optimizer to executor, from DDL to backup/restore and transaction processing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Backup/restore support: Column-store backup and restore is fully compatible with row-store mechanisms, supporting both full and incremental backups.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transaction consistency: The column-store engine natively supports distributed strong-consistency transactions via MVCC, guaranteeing consistent data views.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;High-concurrency processing: The LSM-Tree-based architecture supports high-concurrency transactional and query operations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mixed-workload capability: High-concurrency transaction processing and complex analytical queries coexist, providing a unified data processing platform.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Through these adaptations, OceanBase delivers a new technical option for modern enterprise applications requiring real-time analytics, strong transactional guarantees, and high concurrency — particularly in finance, e-commerce, and IoT where data freshness and consistency requirements are stringent.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Core Value
&lt;/h2&gt;

&lt;p&gt;OceanBase’s native column-store engine leverages LSM-Tree architectural innovation to solve the traditional column-store bottleneck around strong transactions and high concurrency — a significant advance in HTAP database technology.&lt;/p&gt;

&lt;p&gt;Key technical breakthroughs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Architectural unification: Seamless row-column fusion under one architecture eliminates the data synchronization complexity and latency inherent in traditional HTAP systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Native transactional support: The column-store engine natively supports distributed strong-consistency transactions — rare in the industry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Concurrency scalability: MVCC combined with LSM-Tree enables large-scale concurrent read/write operations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real-time analytics: Achieves second-level data freshness for analytical queries — delta data is immediately available for analysis.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From an industry perspective, the column-store engine redefines the direction of HTAP databases. It demonstrates that a unified architecture outperforms separated architectures in both performance and cost, that strong transactions and real-time analytics are achievable simultaneously, and that high-concurrency OLTP and complex OLAP can coexist.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Use Cases
&lt;/h2&gt;

&lt;p&gt;OceanBase’s column-store engine applies broadly across multiple scenarios.&lt;/p&gt;

&lt;p&gt;OLAP workloads: In data warehouse applications, the columnar format excels at large-scale data import and transformation, significantly improving ETL throughput. Complex report generation benefits from vectorized execution and pre-aggregation optimizations.&lt;/p&gt;

&lt;p&gt;Real-time analytics: For user behavior analysis, business monitoring dashboards, and similar use cases, OceanBase delivers sub-second query latency. Anomaly detection systems perform rapid identification on real-time data for timely alerting.&lt;/p&gt;

&lt;p&gt;HTAP mixed workloads: E-commerce platforms process high-concurrency transactions and complex sales analytics on the same platform, eliminating synchronization delays. Financial institutions achieve unified trading and risk control — real-time transactions and risk monitoring on one platform.&lt;/p&gt;

&lt;p&gt;IoT and monitoring: High-volume device data collection and analysis, efficient time-series storage and querying, and predictive maintenance based on real-time device data — all demand strong real-time analytics capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Summary and Roadmap
&lt;/h2&gt;

&lt;p&gt;OceanBase’s native column-store engine provides a new technical option for real-time analytics. From architectural unification to native transactions, from storage-layer pushdown to vectorized execution, the column-store engine forms a comprehensive technical system. Looking ahead, OceanBase will continue evolving across three dimensions: functionality, performance, and deployment flexibility.&lt;/p&gt;

&lt;p&gt;Richer Functionality&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Flexible column groups: Currently supports pure column storage; future releases will enable custom column group partitioning for diverse analytical needs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enhanced direct load: Further improvements to incremental direct load capabilities will shorten data preparation time for analytics.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stronger Performance&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Skip Index enhancement: Expand the statistical dimensions supported by Skip Index, covering more query patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unified storage format: Current storage formats are diverse; future releases will deeply integrate storage formats with the SQL vectorized engine, automatically recognizing different formats during SQL execution to reduce conversion overhead.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flexible Deployment&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Heterogeneous replicas: Support OLAP-specific heterogeneous replica types for specialized deployment requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage-compute separation: Future support for storage-compute separation, enabling independent scaling of storage and compute for AP workloads at lower cost.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Continued evolution of OceanBase’s column-store engine will further strengthen its position as an enterprise-grade unified HTAP data platform.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>database</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How Does an Enterprise-Grade Query Optimizer Keep HTAP Workloads Accurate, Fast, and Stable Over Time?</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 28 Jun 2026 03:11:57 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/how-does-an-enterprise-grade-query-optimizer-keep-htap-workloads-accurate-fast-and-stable-over-1e24</link>
      <guid>https://dev.to/oceandata4ai/how-does-an-enterprise-grade-query-optimizer-keep-htap-workloads-accurate-fast-and-stable-over-1e24</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6dbjycv62kuno28portj.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6dbjycv62kuno28portj.webp" alt="Optimizer" width="800" height="533"&gt;&lt;/a&gt;Photo by on &lt;a href="https://unsplash.com/@hjwinunsplsh" rel="noopener noreferrer"&gt;Jungwoo Hong&lt;/a&gt; on &lt;a href="https://unsplash.com/" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Today, many enterprises treat “running real-time analytics directly on transactional data” as a baseline capability: the same dataset must support high-concurrency point lookups and short transactions, while also delivering reports, risk-control rule evaluation, and operational dashboards within minutes or even seconds.&lt;/p&gt;

&lt;p&gt;This is not merely a question of whether something is “technically possible.” Enterprises want to minimize the servers, storage, synchronization, and operational overhead that come with building and maintaining a separate lakehouse pipeline just for analytics. At the same time, they want to shorten the path from when transactional data is generated to when analytical results become available, so that reporting, risk control, and business decisions can stay as close to real time as possible.&lt;/p&gt;

&lt;p&gt;In many enterprise deployments, transactional and analytical systems are built separately. But as analytics increasingly emphasizes the timeliness of results while overall system cost must be kept in check, this model introduces new problems: data has to be moved and synchronized across multiple systems, pipelines grow longer, consistency governance becomes more complex, and analytical results are more likely to lag behind the true state of the business.&lt;/p&gt;

&lt;p&gt;For exactly this reason, users rarely think of the word “optimizer” first — until a familiar yet unsettling pattern starts showing up in production: the same kind of SQL, on the same hardware, sometimes runs fast and sometimes runs so slowly it feels like a completely different system.&lt;/p&gt;

&lt;p&gt;When troubleshooting, people tend to first suspect network jitter, disk bottlenecks, or a sudden traffic spike during a particular window. But in HTAP scenarios, a significant share of performance fluctuations can ultimately be traced back to the same root cause: the execution plan no longer fits the current data distribution, statistics, and workload state.&lt;/p&gt;

&lt;p&gt;What an enterprise-grade query optimizer must solve is precisely this: continuously generating more reasonable execution plans in such a mixed environment — letting analytical queries complete at an acceptable resource cost, while avoiding situations where a wrong access path, a poor distributed execution strategy, or an unreasonable degree of parallelism drags every other workload on the same cluster into jitter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why HTAP Scenarios Are More Prone to Execution Plan Problems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why SQL in HTAP Scenarios Runs “Sometimes Fast, Sometimes Slow”
&lt;/h3&gt;

&lt;p&gt;Most slow SQL in production is not caused by “having no optimizer,” but rather by “the optimizer choosing the wrong plan under current conditions.”&lt;/p&gt;

&lt;p&gt;The most typical cases include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;After data volume grows, the system still uses a join order or access path that was only suitable for a smaller data scale;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After the data distribution becomes skewed, the cost model still estimates as if the distribution were uniform;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After statistics are refreshed, the plan switches to a path that is theoretically better but performs worse in practice.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In HTAP scenarios, these deviations get amplified: response-time fluctuations of short queries turn into long tails, while analytical queries may — due to a poor distributed strategy or degree of parallelism — consume extra shared resources, ultimately stripping “real-time analytics” of its real-time nature.&lt;/p&gt;

&lt;p&gt;Therefore, the first step in performance optimization is not “making one particular SQL run a bit faster,” but making plan selection as explainable and reproducible as possible: why columnar storage was chosen this time, why a redistribution was needed this time, and why the degree of parallelism is N rather than M.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why HTAP Makes “Plan Selection” Harder Than Pure TP or AP
&lt;/h3&gt;

&lt;p&gt;In transaction-oriented (TP) workloads, each query touches little data, has tight response-time requirements, and follows highly repetitive execution patterns; the goal is typically low latency and path stability. In analytics-oriented (AP) workloads, queries more often involve large-range scans, multi-table joins, aggregation, sorting, and window functions, and rely more heavily on columnar storage, distributed execution, and parallelism; the goal is typically throughput and execution efficiency for complex queries.&lt;/p&gt;

&lt;p&gt;HTAP stacks both on top of the same engine, the same database objects (indexes, partitions, and so on), and the same set of statistics. This means the optimizer cannot simply answer “is this operator fast or not”; within a unified cost framework, it must simultaneously handle two completely different plan-selection problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For short queries: row store or index path;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For large queries: columnar scan or another access path;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Under distributed execution: stay local or perform data redistribution;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to choose the join algorithm;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Serial execution or parallel execution.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What makes it even trickier is that the runtime environment is always changing: data growth, distribution skew, statistics refreshes, and version upgrades all change the answer to “which plan is cheaper.” Once row-count estimation or cost evaluation goes wrong, every subsequent choice compounds in the wrong direction, eventually manifesting as query latency jitter, wasteful amplification of CPU and I/O, and even mutual interference with other workloads during peak hours.&lt;/p&gt;

&lt;p&gt;One boundary needs to be made clear here: the fundamental competition between transactions and analytics over physical resources is governed by system-level capabilities such as resource isolation, quotas, and scheduling. The optimizer’s value is to reduce the extra amplification caused by plan errors — not to replace system-level governance.&lt;/p&gt;

&lt;h2&gt;
  
  
  How OceanBase Selects, Runs, and Stabilizes Plans Well
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Unified CBO: How OceanBase Handles TP and AP Within a Single Framework
&lt;/h3&gt;

&lt;p&gt;OceanBase does not build two separate, unrelated optimizers for TP and AP. On top of the same CBO (Cost-Based Optimizer) framework and the same set of database objects — statistics, indexes, partitions, and so on — OceanBase selects different but cost-comparable execution plans for different SQL shapes.&lt;/p&gt;

&lt;p&gt;For TP-leaning queries, the optimizer focuses more on path stability, index and table-lookback costs, and short-query response time. For AP-leaning queries, the optimizer brings columnar scans, pruning effectiveness, distributed data movement, join algorithms, and parallel costs all into consideration.&lt;/p&gt;

&lt;p&gt;From an implementation standpoint, the OceanBase query optimizer follows the cost-based System-R approach: it enumerates and evaluates the cost of base-table access paths, join orders, join algorithms, and other operator combinations to generate the final execution plan. As a distributed HTAP engine, OceanBase’s main difference lies here: distributed properties and parallel properties must be considered alongside local operator costs from the plan-generation stage onward. Distributed and parallel properties are not add-on information tacked on at execution time — they are part of the plan’s cost.&lt;/p&gt;

&lt;p&gt;While fully inheriting the unified CBO framework, OceanBase has made targeted enhancements for the parts of analytical queries that are more common and more prone to amplifying cost deviations. Viewed against the main line of optimizer work, these capabilities essentially solve three things: first select the plan accurately, then make the plan run well, and finally keep the plan stable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Selecting the Plan Accurately: Row Estimation, Statistics, and Path Selection
&lt;/h3&gt;

&lt;p&gt;For the optimizer, the prerequisite for “selecting accurately” is to estimate data scale and selectivity as accurately as possible first. Whether it’s the access path, join order, or subsequent query rewriting and parallel decisions, everything is ultimately built on cost evaluation; and whether cost evaluation can be trusted depends first on the quality of row estimation and statistics.&lt;/p&gt;

&lt;p&gt;In HTAP scenarios where data changes continuously, relying solely on static statistics is often not enough. To address this, OceanBase combines online statistics collection, dynamic sampling, and storage-layer row estimation into its cost model, so that judgments of “row count and selectivity” stay closer to the real data state.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The core role of dynamic sampling is to improve estimation quality when statistics are insufficient or inaccurate, providing more reliable input to the CBO;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage-layer row estimation stays closer to the real data distribution, helping improve the accuracy of access-path selection;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Online statistics collection is used to reduce how far statistics lag behind the real data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In particular, OceanBase’s row-estimation method — based on logical rows and physical rows — can account for both incremental and baseline data, unlike traditional approaches that rely more heavily on static statistics. This brings the statistical profile closer to a real-time state, and it also handles predicate dependencies more naturally in composite-index scenarios. For example, when an index (a, b) encounters a condition like a = 1 and b = 1, it can obtain an estimate close to the true selectivity without over-relying on extra multi-column histograms or complex compensation logic.&lt;/p&gt;

&lt;p&gt;Only once row estimation becomes more trustworthy can the optimizer truly do a good job of “selecting the path accurately.” This matters especially in HTAP scenarios, where the same semantics can often be served either by a row-store index path or by a columnar scan path, and the optimizer must make an explainable comparison within a unified cost framework. For columnar scans, if the benefit of a Skip Index cannot be correctly evaluated, it’s easy to fall into a “looks cheap, actually expensive” path mis-selection; for row-store and index paths, row-estimation errors can likewise inflate an access that should have been local into a far more costly execution method.&lt;/p&gt;

&lt;p&gt;The same logic applies to complex SQL. The value of query rewriting is not “the more rules the better,” but in turning SQL into a structure that more easily yields a globally optimal plan — providing a better starting point for subsequent path selection, join ordering, and parallel decisions. In other words, accurate estimation is not the finish line; it is the prerequisite that lets the entire chain of downstream decisions start from the right place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running the Plan Well: Distributed Global Cost and Parallel Decisions
&lt;/h3&gt;

&lt;p&gt;In a Shared-Nothing architecture, the plan that is “cheapest on a single node” is often not the one that is “cheapest for the cluster.” Once data is spread across multiple nodes, joins and aggregations may introduce broadcasts, redistributions, and multiple rounds of data movement. These costs are often insignificant in a single-node model, but at the cluster level they can become the dominant expense.&lt;/p&gt;

&lt;p&gt;Therefore, the real problem a distributed database optimizer must solve is not “can it generate a distributed plan,” but can it bring distribution properties, data movement, and operator implementations into a global comparison at the plan-generation stage. This determines whether it falls into the trap of “locally optimal, globally inefficient.”&lt;/p&gt;

&lt;p&gt;From an optimization-framework perspective, distributed plan optimization is harder than local optimization in three main ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The operator implementation space is larger&lt;br&gt;
Take Hash Join as an example: in a distributed environment, different data distributions correspond to different distributed implementation algorithms, giving a far larger choice space than single-node algorithms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Physical properties are more complex&lt;br&gt;
Beyond sort properties, a distributed plan must also maintain properties such as partition information and data location, which directly determine whether a given operator can adopt a particular distributed execution method.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parallelism and partitioning further expand the search space&lt;br&gt;
Partition pruning, intra-partition parallelism, inter-partition parallelism, and degree-of-parallelism selection all add to optimization complexity.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To address these issues, one important direction for OceanBase is to bring distributed properties into plan selection as early as possible, rather than simply “generating a locally optimal plan first and then bolting on a distributed execution method.”&lt;/p&gt;

&lt;p&gt;Closely related to distributed planning is parallel decision-making. Parallelism can indeed shorten the wall time of complex queries, but a higher degree of parallelism is not always better. Too much parallelism amplifies CPU, memory, and system jitter; too little leaves long queries unable to finish for a long time. So letting the optimizer balance “whether to parallelize and how much” between response time and resource cost is more aligned with engineering reality than relying on fixed hints across large numbers of SQL statements.&lt;/p&gt;

&lt;p&gt;This is exactly the point of OceanBase’s optimizer Auto DOP (Automatic Degree of Parallelism) capability: the optimizer automatically decides “whether to parallelize and what degree to use” based on query cost and resource state. It does not simply chase a shorter single-query response time; it brings parallelism itself into the plan cost, aiming to reduce “bad parallelism” within the given resource constraints.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Columnar storage and Skip Index mainly solve “read less, read the right things.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Distributed planning and parallel decisions mainly solve “how to move less data and how to expand operators.”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, the two form a more complete real-time analytical execution-plan capability for HTAP scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keeping the Plan Stable: Caching and Evolution Mechanisms for High-Frequency SQL
&lt;/h3&gt;

&lt;p&gt;If row estimation, distribution, and parallelism mainly solve “selecting and running the plan well,” then for an enterprise-grade system there is a third thing to consider: can the plan stay stable over the long term.&lt;/p&gt;

&lt;p&gt;For complex analytical queries, the core problems usually remain row estimation, distribution, and parallel selection themselves; plan caching and SPM are not the default main path. But for highly reusable SQL — especially high-concurrency, low-latency business requests — plan stability becomes a very practical concern. Changes in statistics, growth in data scale, and version upgrades all lead the optimizer to generate new plans, and a “theoretically better” new plan may not actually perform better under real workloads.&lt;/p&gt;

&lt;p&gt;For scenarios like these, OceanBase has built a fairly complete plan-caching and plan-evolution mechanism.&lt;/p&gt;

&lt;p&gt;First is parameterized plan caching.&lt;br&gt;
For high-concurrency workloads, generating and caching a separate plan for every specific parameter is both costly and impractical. The value of parameterized caching is to let a large number of SQL statements that share the same shape but differ in parameters reuse the same plan, keeping the execution overhead low when a cached plan is hit.&lt;/p&gt;

&lt;p&gt;But parameterized caching does not mean “one shared plan is always optimal for all parameters.” In real business, the data scale corresponding to different parameters can vary enormously. For example, even for the same task — “compute a merchant’s sales over the past year” — a large merchant may be better served by a main-table scan, while a small merchant is better served by an index. Forcing the same plan to be reused in both cases leads to obvious parameter-sensitivity problems. In other words, plan caching solves “can plans be reused efficiently,” but it does not automatically solve “whether this plan will remain suitable for all parameters and the current data state over the long term.”&lt;/p&gt;

&lt;p&gt;At this point, the question evolves from “whether to cache the plan” into “when should the plan change, and can that change happen safely.” Building on this, the plan-evolution mechanism (SPM) further addresses “can a new plan be switched in safely.”&lt;br&gt;
For highly reusable SQL, the core value of SPM is not to boost analytical performance, but to turn plan stability from a matter of human experience into mechanized governance: a new plan is not immediately and fully adopted the moment it is generated; instead, it is validated against real business traffic. Only when validation shows it is genuinely better than the baseline plan will subsequent SQL continue to use the new plan; otherwise, execution falls back to the baseline plan.&lt;/p&gt;

&lt;p&gt;So SPM is more accurately positioned as: a means of governing plan stability for highly reusable SQL. It is part of the optimization loop, but it is not the main path for improving complex analytical performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  From “Selecting the Right Plan” to “Keeping the Plan Under Control”
&lt;/h3&gt;

&lt;p&gt;A query optimizer generally faces two structural constraints. First, statistics, dynamic sampling, and cost parameters describe “the data from some earlier period,” while the query happens “right now.” Second, search and decision time is limited — even within a mature framework like System-R, it is impossible to enumerate all equivalent plans and find the global optimum within an acceptable time.&lt;/p&gt;

&lt;p&gt;Given this, “sometimes choosing the wrong plan” is not a weakness unique to any one product, but the normal state of the CBO problem itself. OceanBase weighs row-store versus columnar paths, distributed data, and the degree of parallelism within the same “estimate — compare — select” logic: row-estimation errors, join order, shuffle, and parallel decisions can all drift off course in a chain reaction; HTAP stacking (TP and AP sharing statistics and database objects on the same engine) only makes it easier for “sensitive SQL” to expose plan jitter during peaks or statistics-refresh windows.&lt;/p&gt;

&lt;p&gt;The maturity of an enterprise-grade optimizer should not be measured by a single benchmark. What matters more is whether, when a plan no longer matches the real workload, the execution plan can be turned into a process that is observable, intervenable, evolvable, and reversible. Beyond “selecting plans,” OceanBase also shares the responsibility of “governing plans” together with the execution side, database objects (indexes, partitions, and so on), and the operations toolchain: letting plans be continuously corrected as data, versions, and business evolve, rather than treating planning as a one-time result.&lt;/p&gt;

&lt;p&gt;The common sources of “choosing the wrong plan” fall into three categories: row estimation deviating from the data state (changes in table size, data skew, or correlated predicates causing inaccurate estimation; the first response should be refreshing statistics and adjusting the collection strategy, not jumping to hints); parameter sensitivity of high-frequency SQL (parameterized caching does not guarantee all parameters are equally well served, requiring coordination with business data traffic and the timing of baseline establishment); and database objects (indexes, partitions, and so on) or version upgrades (SPM compares new and old plans under real traffic, rather than switching the entire cluster instantaneously after a single statistics task).&lt;/p&gt;

&lt;p&gt;OceanBase handles execution-plan risk through a combination of runtime reuse, switch-time governance, and manual intervention during troubleshooting.&lt;/p&gt;

&lt;p&gt;At runtime, parameterized plan caching is used first to reduce hard-parsing overhead, letting SQL with the same structure but different parameters reuse existing plans as much as possible. If parameter distributions vary widely and a single plan cannot cover all scenarios, further mechanisms such as SPM, adaptive plan matching, or targeted hints are needed to constrain plan behavior.&lt;/p&gt;

&lt;p&gt;Among these, SPM’s role is “validate the new plan first, then promote it.” A candidate new plan does not immediately replace the existing one; instead, it enters the baseline-and-evolution process and is validated under real traffic before deciding whether to switch. Through sql_plan_management_mode, the system can also make policy trade-offs between "online evolution" and "baseline plans."&lt;/p&gt;

&lt;p&gt;For problems that have already been clearly diagnosed, Outline / Hint are better suited as local correction tools, used to pin down access paths, join orders, or parallel strategies. During the diagnosis phase, tools such as EXPLAIN, SQL Trace, and DBMS_XPLAN can help locate execution-plan problems.&lt;/p&gt;

&lt;p&gt;Therefore, the OceanBase optimizer cannot guarantee that every SQL statement uses the optimal plan on every execution. Instead, it places plans within a continuously governable process: at runtime it relies on plan caching to reduce repeated optimization cost; when plans change, it controls switching risk through SPM; and after problems surface, it uses hints, outlines, and diagnostic tools to locate and correct them.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Capabilities to Scenarios: Where the Optimizer Lands in Typical HTAP / Real-Time Analytics Workloads
&lt;/h2&gt;

&lt;p&gt;The “select accurately, run well, stay stable” described above is not an abstract optimizer slogan; it translates into different benefit priorities across different business workloads. For enterprise systems, the value of the optimizer is often demonstrated not through a single benchmark, but through whether it can reliably support several typical query scenarios.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex queries and financial core scenarios: the focus is whether global cost is trustworthy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In scenarios like financial cores and real-time risk control, queries are often not simple point lookups; they need to combine transaction data, account information, rule conditions, market data, or credit-limit information to make high-frequency judgments. Such queries are characterized by many joined tables, strong real-time requirements, and sensitivity to result latency, while also being easily affected by data distribution and cross-node access.&lt;/p&gt;

&lt;p&gt;In these scenarios, the most critical capability of the OceanBase optimizer is not making a particular operator locally optimal, but whether it can form a more trustworthy global cost judgment at the plan-generation stage. Whether statistics and row estimation are accurate determines whether the access path and join order are reasonable; whether the distributed plan considers data location and network cost early enough determines whether cross-node latency keeps getting amplified in high-frequency queries. For these mixed TP-and-real-time-risk-control queries, the value of the OceanBase optimizer shows up first in keeping complex joined queries stably within an acceptable latency, rather than letting plan fluctuations become a source of tail latency.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTAP / real-time analytics scenarios: the focus is whether row/columnar paths, distribution, and parallel decisions can be handled together within a unified framework&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In more analytics-leaning workloads such as reporting, operational analytics, and real-time data warehouses, the problems the optimizer faces become even more complex, because the same dataset may serve both high-concurrency short queries and large-range scans, multi-table joins, and aggregation analytics.&lt;/p&gt;

&lt;p&gt;The OceanBase optimizer solves these problems within the same CBO decision framework: row/columnar path selection, analytical cost rewriting, distributed plan generation, Auto DOP, and — when necessary — plan-stability governance can all happen in coordination within a unified optimization process.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-concurrency write scenarios: the optimizer’s focus lies more in stability governance for highly reusable SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more OLTP-leaning high-concurrency write scenarios, the priorities differ from analytical workloads. Here, plan caching, parameterized matching, execution-plan stability, and the commit efficiency of the write path itself usually deserve more attention. Capabilities like Plan Cache are more directly relevant to typical transactional write paths; the performance of complex analytical SQL, on the other hand, often still needs to be considered together with sharding design, index design, and specific query patterns.&lt;/p&gt;

&lt;p&gt;This precisely illustrates that the value of the OceanBase optimizer does not lie in “solving every scenario with the same single point,” but in whether it can, for different workloads, combine capabilities like row estimation, path selection, distributed execution, parallelism, and stability governance into plan-selection logic suited to that scenario.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Key to the OceanBase Optimizer Is a Closed Optimization Loop Within a Unified Framework
&lt;/h2&gt;

&lt;p&gt;Let’s return to the question from the beginning of this article: why does the same kind of SQL run “sometimes fast, sometimes slow” within the same system?&lt;br&gt;
The real answer often lies not in whether some single feature exists, but in whether the system has a mechanism that can continuously generate reasonable plans under continuously changing data and workload conditions.&lt;/p&gt;

&lt;p&gt;The OceanBase optimizer, with distributed-native + HTAP-integrated as its through-line, completes a full closed loop within a unified CBO framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Through statistics, dynamic sampling, storage-layer row estimation, and other capabilities, it makes cost judgments as trustworthy as possible;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Through row/columnar path selection, complex SQL rewriting, distributed plan generation, and parallel decisions, it selects and runs plans well;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Through parameterized plan caching, adaptive plan matching, and SPM, it brings stability governance for highly reusable SQL into a systematic mechanism.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the true value of an enterprise-grade query optimizer in HTAP and real-time analytics scenarios: it does not just make a single SQL run faster, but keeps the system able to maintain performance, resource efficiency, and stability within an explainable, governable, and dependable range — even under complex queries, distributed execution, and continuously changing data conditions.&lt;/p&gt;

</description>
      <category>database</category>
      <category>sql</category>
      <category>dataengineering</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>AI Writes Code Faster. Why Hasn’t Delivery?</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Wed, 27 May 2026 15:59:00 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/ai-writes-code-faster-why-hasnt-delivery-3fgj</link>
      <guid>https://dev.to/oceandata4ai/ai-writes-code-faster-why-hasnt-delivery-3fgj</guid>
      <description>&lt;p&gt;&lt;em&gt;The bottleneck didn’t disappear — it moved downstream.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiq2rb08b5wazhqe98hlk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiq2rb08b5wazhqe98hlk.png" alt="Photo by Ben Hershey on Unsplash" width="799" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When coding gets 10× faster but review, CI, and release don’t, you don’t get 10× delivery. You get a longer queue behind the keyboard.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If your stack includes a database layer, that queue often shows up twice — once in application CI, again in migrations, backups, and failover. The closing section links how we think about that at OceanBase.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I recently watched &lt;a href="https://www.youtube.com/watch?v=PplmzlgE0kg" rel="noopener noreferrer"&gt;an interview with Cat Wu&lt;/a&gt; on how Anthropic’s product team went from shipping a feature every few months to every few weeks, sometimes days, and for small slices of work — even within a single day. Our team has been having a parallel conversation: what concrete practices actually turn AI speed into delivery speed?&lt;/p&gt;

&lt;p&gt;My main takeaway isn’t “AI writes code scary fast.”&lt;/p&gt;

&lt;p&gt;That’s the shallow read.&lt;/p&gt;

&lt;p&gt;Lots of teams now claim delivery is &lt;strong&gt;10× or 50× faster&lt;/strong&gt; with AI.&lt;/p&gt;

&lt;p&gt;I’m skeptical.&lt;/p&gt;

&lt;p&gt;Because one thing gets conflated all the time:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Faster code generation is not the same thing as faster software delivery.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An agent can draft a patch in ten minutes. Sure. But whether that patch can land on &lt;code&gt;main&lt;/code&gt;, be validated, reach real users, and be debugged or rolled back when it breaks—that’s a different system entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If only the “writing code” step speeds up while review, testing, release, monitoring, and rollback stay the same, “50× faster” is often a local illusion: one part of the pipe got hot; the org is still stuck.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most teams, in my view, aren’t at the “quantity becomes quality” inflection yet.&lt;/p&gt;

&lt;p&gt;The real shift isn’t “everyone runs more coding agents.” It’s that &lt;strong&gt;how you set goals, verify code, ship features, and contain risk&lt;/strong&gt; has to change. Otherwise AI doesn’t multiply delivery — it multiplies &lt;strong&gt;PRs waiting for review, features waiting for validation, and branches waiting to merge&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As &lt;a href="https://circleci.com/landing-pages/assets/2026-state-of-software-delivery-report.pdf" rel="noopener noreferrer"&gt;CircleCI&lt;/a&gt; puts it, success in the AI era is “no longer determined by how fast code can be written” — the decisive factor is whether you can “validate, integrate, and recover at scale.” For most teams, that’s a sharper framing than asking whether AI can generate code at all.&lt;/p&gt;

&lt;p&gt;What’s more interesting: Cat has said outright that &lt;strong&gt;new internal models weren’t the main driver&lt;/strong&gt; of faster iteration. The lever was &lt;strong&gt;process&lt;/strong&gt; — how goals are set, how docs are written, how previews ship, how cross-functional work runs, and who has authority to put something in front of users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shipping a feature a day isn’t about whether the model can code. It’s about whether the org removed what used to slow releases down.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here’s the thesis up front:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When code gets cheap, the expensive thing becomes judgment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Judgment about what’s worth building, how good is good enough to ship, where a human must decide, and where an agent can run to completion.&lt;/p&gt;

&lt;p&gt;So “AI-native” speed isn’t “everyone uses AI to write code.” Two things have to happen together:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Less idle motion in the process. More clarity in the rules.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Less idle motion means fewer documents, handoffs, approvals, and waits that exist only because “that’s how we’ve always worked.” More clarity means goals, evidence, verification, permissions, release, and rollback are specified — not reinvented in every hallway conversation.&lt;/p&gt;

&lt;p&gt;Only when both are true does “one release per day” stop being a gamble.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Most teams misunderstand “AI speedup”
&lt;/h2&gt;

&lt;p&gt;Seeing Anthropic ship often, many people jump to: strong models, engineers on Claude Code, therefore fast coding.&lt;/p&gt;

&lt;p&gt;That helps. It isn’t the main story.&lt;/p&gt;

&lt;p&gt;If coding is the only thing that speeds up, what do you get?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;More PRs waiting for review&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;More features waiting for validation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;More half-finished branches&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;More production risk&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;More arguments about &lt;strong&gt;whether this is shippable at all&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not faster delivery. It’s &lt;strong&gt;congestion moved downstream&lt;/strong&gt; from typing to everything after typing.&lt;/p&gt;

&lt;p&gt;A routine feature might have taken days from kickoff to first PR. Now a crisp small ask can get a first diff in minutes. But “code exists” is a short leg of the journey.&lt;/p&gt;

&lt;p&gt;After that comes a whole chain: Should we build this at all? Is this the right shape? Who owns blast radius? Is test evidence enough? Can we spin a preview? Should this sit behind a flag? How do we roll back?&lt;/p&gt;

&lt;p&gt;When coding was slow, that chain was &lt;strong&gt;masked by coding time&lt;/strong&gt;. Code in thirty minutes, CI still twenty, review still half a day, QA still queued — and the backlog becomes visible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.qovery.com/blog/ai-devops-2026-cicd-pipeline-bottleneck" rel="noopener noreferrer"&gt;Qovery’s piece on AI and DevOps in 2026&lt;/a&gt; makes the same point from the platform side: when AI coding tools explode code throughput, &lt;strong&gt;CI/CD, environment provisioning, and deployment pipelines&lt;/strong&gt; — not typing speed — become the constraint. As they put it, the bottleneck has &lt;strong&gt;flipped&lt;/strong&gt;: less time coding, more time waiting on builds, previews, and deploys.&lt;/p&gt;

&lt;p&gt;So I’ve stopped framing R&amp;amp;D efficiency as &lt;strong&gt;lines of code per hour&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A better line:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI compresses the coding segment and forces you to face the real system bottleneck.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc9lkz5j8u7m5gmj2k5uj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc9lkz5j8u7m5gmj2k5uj.png" alt="AI impact" width="720" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The lesson from Anthropic’s public interviews isn’t simply “they had a better model.” Cat Wu has said internal model use raised shipping speed only “a little bit,” with “the bulk of the increase” coming from process and team expectations. When code gets cheaper to write, both Wu and Mike Krieger describe bottlenecks shifting — to deciding what to build, merge queues, CI, and the other steps that used to hide behind slow coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Layer one: nail the goal, thin the doc
&lt;/h2&gt;

&lt;p&gt;In classic product development, the PRD (product requirements document) often functions as &lt;strong&gt;comfort food&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Longer feels more professional. More edge cases upfront feels more in control. Before engineering starts, you get fifteen pages of background, scope, flows, exceptions, competitors, and timeline.&lt;/p&gt;

&lt;p&gt;In AI-native teams, that pattern ages badly.&lt;/p&gt;

&lt;p&gt;Not because docs don’t matter — because &lt;strong&gt;long docs often disguise decisions as description&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What matters isn’t pre-specifying every button and edge case. It’s being explicit about &lt;strong&gt;goal, principles, and how you’ll know it worked&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Cat’s interview: routine features don’t need a novel-length PRD; large infrastructure bets still might. Many features need &lt;strong&gt;one page&lt;/strong&gt; — goal, principles, metrics — and then people with context make local calls.&lt;/p&gt;

&lt;p&gt;That’s a real role change.&lt;/p&gt;

&lt;p&gt;PM value used to be: “I specified it completely; engineering executes.”&lt;/p&gt;

&lt;p&gt;Now it’s closer to: &lt;strong&gt;“I made the goal and tradeoffs legible so people with context can decide quickly.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That matters more with agents in the loop. If every micro-decision waits on PM, you’re back to pre-AI cadence. If the goal is fuzzy, agents and humans will sprint in the wrong direction together.&lt;/p&gt;

&lt;p&gt;A one-pager isn’t laziness. It should answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Goal
What user problem are we improving?

## Non-goals
What are we explicitly not solving this round?

## Principles
When tradeoffs appear, what wins?

## Success signals
What observable outcomes mean "keep going"?

## Risks
What must never be touched without a human? What needs explicit sign-off?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That beats twenty pages of “looks complete” for agents and engineers alike. Agents don’t need literary prose — they need &lt;strong&gt;bounds and decision rules&lt;/strong&gt;. Engineers don’t need a script — they need to know where they can decide alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Layer two: ship research previews, lower the promise
&lt;/h2&gt;

&lt;p&gt;Another Anthropic habit: lots of capabilities land first as &lt;strong&gt;research previews&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That can sound like “ship half-baked work.” It isn’t lowering quality — it’s &lt;strong&gt;narrowing the promise&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why do traditional teams ship slowly?&lt;/p&gt;

&lt;p&gt;Every release carries a hidden contract: stable, complete, friendly to all users, documented, consistent, no obvious landmines.&lt;/p&gt;

&lt;p&gt;That contract is heavy.&lt;/p&gt;

&lt;p&gt;So teams enter wait mode: one more edge case, one more design pass, one more test round, one more stakeholder sync.&lt;/p&gt;

&lt;p&gt;Six months later, users still haven’t touched it.&lt;/p&gt;

&lt;p&gt;Research previews change &lt;strong&gt;expectations upfront&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is early. We’re still finding the shape. You can try it — it isn’t the final product.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three effects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You don’t have to wait for perfect.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real user signal arrives earlier.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bad product directions fail faster.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trap: research preview ≠ &lt;strong&gt;irresponsible preview&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can’t throw junk over the wall and blame “early access” for UX debt.&lt;/p&gt;

&lt;p&gt;A healthy preview still needs three guardrails: &lt;strong&gt;bounded blast radius, risk can be turned off or rolled back, and users know what stage they’re in.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The promise can be lighter. Safety work cannot.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s why “less process idle time” only works alongside &lt;strong&gt;clearer rules&lt;/strong&gt; — fewer approvals, not fewer checks; thinner docs, not thinner goals; earlier ship, but still flags, monitoring, and rollback.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fed6w8gjxqqcww4zyv0w4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fed6w8gjxqqcww4zyv0w4.png" alt="Research preview" width="720" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Layer three: looser roles, pricier taste
&lt;/h2&gt;

&lt;p&gt;Cat noted something representative: many PMs on Claude Code have engineering backgrounds; designers ship frontend code. Product, engineering, and design are &lt;strong&gt;less siloed&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That trend will accelerate.&lt;/p&gt;

&lt;p&gt;When code is cheap, work that existed only to &lt;strong&gt;hand off&lt;/strong&gt; starts to feel wasteful.&lt;/p&gt;

&lt;p&gt;PMs used to validate an idea via spec → design → eng queue → wait. Now a PM who codes can spike a prototype; an engineer with product sense can fix interaction gaps; a designer who ships UI can get to something runnable.&lt;/p&gt;

&lt;p&gt;That doesn’t mean everyone becomes full-stack or specialties vanish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Role is default responsibility — not the ceiling on what you can do.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You might be a PM — but can you prototype when the team needs it? You might be an engineer — but can you flag when the problem framing is wrong?&lt;/p&gt;

&lt;p&gt;The scarce skill isn’t typing code. It’s &lt;strong&gt;product taste&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When everything is buildable, the expensive question becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Of all the things we could build, which ones should exist?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s what Cat means when she says that as code gets cheaper, &lt;strong&gt;choosing what to write&lt;/strong&gt; gets more valuable.&lt;/p&gt;

&lt;p&gt;Without taste, AI teams do something worse than traditional teams:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They ship the wrong things faster.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bad ideas used to die on engineering cost. Not anymore — agents will diligently implement them.&lt;/p&gt;

&lt;p&gt;In AI-native teams, PM value isn’t scheduling or chasing status. It’s picking the highest-leverage slice, defining a &lt;strong&gt;small but real&lt;/strong&gt; unit of ship, and separating noise from signal in early feedback.&lt;/p&gt;

&lt;p&gt;That’s harder than “writing good prompts.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fae8x164lvjbpqfnx7p29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fae8x164lvjbpqfnx7p29.png" alt="The cheaper" width="720" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Layer four: don’t build for imaginary AGI
&lt;/h2&gt;

&lt;p&gt;One of the harder points in the interview: many AI product people &lt;strong&gt;build for a future super-model&lt;/strong&gt;, not today’s model.&lt;/p&gt;

&lt;p&gt;“Models will catch up — ship the scrappy version; AGI will fix it.”&lt;/p&gt;

&lt;p&gt;Dangerous.&lt;/p&gt;

&lt;p&gt;The hard job is &lt;strong&gt;shipping for the model you have now&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Know where it’s strong and weak, what you can delegate, what needs guardrails, what needs external memory, task lists, or human takeover.&lt;/p&gt;

&lt;p&gt;Early Claude Code big refactors sometimes stalled mid-flight — so the team added &lt;strong&gt;todo lists&lt;/strong&gt; so work was decomposed and tracked. As models improved, some of that scaffolding could fade.&lt;/p&gt;

&lt;p&gt;Cat’s line: &lt;strong&gt;the model eats your adaptation layer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A lot of product design that feels essential today will matter less tomorrow&lt;/strong&gt; as models improve.&lt;/p&gt;

&lt;p&gt;Two buckets:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Model-gap scaffolding&lt;/strong&gt; — todos, forced continuation, verifiers when the model won’t self-check. &lt;strong&gt;Let these retire&lt;/strong&gt; as capability catches up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Task-structural constraints&lt;/strong&gt; — permissions, release policy, rollback, audit trails, user promises, gradual rollout, UX commitments.&lt;/p&gt;

&lt;p&gt;Those don’t vanish because the model got smarter — strong human engineers don’t eliminate permission systems either.&lt;/p&gt;

&lt;p&gt;Complexity often comes from mixing the buckets: scaffolding that should expire gets cast in concrete, while structural safety gets waved away as “the model will figure it out.” Both hurt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisbflz7bsgnltls6b0en.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisbflz7bsgnltls6b0en.png" alt="Compensation" width="720" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. What actually holds up “one feature per day”
&lt;/h2&gt;

&lt;p&gt;Mapped to practice, I’d break “daily ship” into something concrete:&lt;/p&gt;

&lt;p&gt;Step 1 — Intake: No novella. Goal, non-goals, principles, risks, acceptance.&lt;/p&gt;

&lt;p&gt;Step 2 — Build: Agent works in an isolated worktree or sandbox — read, edit, run tests, add tests.&lt;/p&gt;

&lt;p&gt;Step 3 — Evidence, not “I’m done”: Files touched, entry points affected, tests run / not run, touches on auth/data/money/security, flag needed?, rollback path.&lt;/p&gt;

&lt;p&gt;Step 4 — Review by risk tier: Low risk → lean on automation + evidence completeness. Medium/high → human eyes on architecture, UX, security.&lt;/p&gt;

&lt;p&gt;Step 5 — Merge without blind exposure: Default behind a flag; internal environments first when possible.&lt;/p&gt;

&lt;p&gt;Step 6 — Release train: A fixed daily window for changes that are merged, verified, and within risk appetite.&lt;/p&gt;

&lt;p&gt;Step 7 — Observe: Error rates, latency, key conversion, user feedback, log anomalies — not “we shipped, done.”&lt;/p&gt;

&lt;p&gt;Step 8 — Incidents: Flip the flag or roll back first; don’t spend thirty minutes in a blame meeting.&lt;/p&gt;

&lt;p&gt;Agents matter. &lt;strong&gt;Clear rules at each gate matter more.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I call that bundle a Release Harness — not a single tool, but constraints on how work is sliced, how evidence is submitted, how risk is tiered, what must be automated, what must stay human, and when merge / ship / rollback is allowed.&lt;/p&gt;

&lt;p&gt;A minimal release checklist is seven fields: &lt;strong&gt;goal, scope, risk tier, verification evidence, ship method, rollback method, watch metrics&lt;/strong&gt;. The point isn’t a pretty template — it’s that every PR can answer them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vadi1itkshhxzfqmrqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vadi1itkshhxzfqmrqf.png" alt="Daily release" width="720" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Without that, “one release a day” becomes &lt;strong&gt;a daily merge of opaque AI diffs&lt;/strong&gt; with users as QA.&lt;/p&gt;

&lt;p&gt;That isn’t leverage. That’s &lt;strong&gt;risk moved downstream again&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Traps I’ve stepped in
&lt;/h2&gt;

&lt;h2&gt;
  
  
  7.1 Treating “daily ship” as “one big feature per day”
&lt;/h2&gt;

&lt;p&gt;Common mistake.&lt;/p&gt;

&lt;p&gt;Daily ship means &lt;strong&gt;small changes can enter production safely each day&lt;/strong&gt; — not a finished epic every sunset.&lt;/p&gt;

&lt;p&gt;Big bets should decompose: schema → API → internal entry → flag → user-visible UI. Each step can ship; not each step must be user-visible.&lt;/p&gt;

&lt;p&gt;If asks stay huge and agents only accelerate implementation, you get &lt;strong&gt;bigger PRs, harder review, nastier integration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;AI amplifies how well you slice work. Slice well → parallel progress. Slice badly → a fast, scary monolith.&lt;/p&gt;

&lt;h2&gt;
  
  
  7.2 One agent, one shot, whole feature
&lt;/h2&gt;

&lt;p&gt;“Here’s the full ticket — do all of it” works in demos. In production, it’s risky.&lt;/p&gt;

&lt;p&gt;Agents finish local tasks well; they don’t automatically know your release bar. They may refactor the wrong layer, add a “reasonable” compatibility shim, and bundle UI, API, tests, and config into one diff.&lt;/p&gt;

&lt;p&gt;The PR looks productive. Reviewers suffer.&lt;/p&gt;

&lt;p&gt;I prefer task types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Explore — read-only, options, no edits&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement — bounded scope, tests required&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify — reviewer mindset, hunt risks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fix — confirmed issues only, no scope creep&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More stable than one agent from start to finish.&lt;/p&gt;

&lt;h2&gt;
  
  
  7.3 Green tests, broken product
&lt;/h2&gt;

&lt;p&gt;Worse and common: unit tests pass, the PR claims “done,” but &lt;strong&gt;no one walked the real user path&lt;/strong&gt; — flag missing in staging, menu hidden, role matrix gap, field mismatch on the live API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agentfield.ai/blog/beyond-vibe-coding" rel="noopener noreferrer"&gt;AgentField’s writeup on ~200 autonomous agents writing production code&lt;/a&gt; describes the same gap: parallel agents can leave every issue green while the merged product still fails — tests passed with mocked dependencies, every acceptance criterion met while a module stayed invisible to consumers. The system optimizes for the criteria and checks you encode, not for coherence you never specified.&lt;/p&gt;

&lt;p&gt;Green means covered paths didn’t explode. Not the product entry works.&lt;/p&gt;

&lt;p&gt;Evidence needs an integration slice: where’s the entry, routes, config, permissions, analytics, end-to-end path.&lt;/p&gt;

&lt;p&gt;Without that, you ship a half-plugged-in feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  7.4 No feature flags, so long branches become “the plan”
&lt;/h2&gt;

&lt;p&gt;“We want high frequency” without feature flags doesn’t add up.&lt;/p&gt;

&lt;p&gt;No flags → unfinished work can’t live on &lt;code&gt;main&lt;/code&gt; → long-lived branches → merge pain → release frequency drops.&lt;/p&gt;

&lt;p&gt;With agents touching many surfaces at once, branch merges get uglier.&lt;/p&gt;

&lt;p&gt;Flags aren’t just “gradual rollout.” They let incomplete capability exist safely on &lt;code&gt;main&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  7.5 Automation that’s 95% right
&lt;/h2&gt;

&lt;p&gt;Cat’s advice for individuals applies to teams: if automation isn’t reliable, it isn’t automation — it’s a new chore.&lt;/p&gt;

&lt;p&gt;95% correct release notes that omit risks → humans re-read every PR.&lt;/p&gt;

&lt;p&gt;Auto-fix CI that patches wrong → humans re-audit every patch.&lt;/p&gt;

&lt;p&gt;Auto-verify that false-passes → humans redo manual QA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The last 5% has to be trustworthy enough to depend on.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing: speed isn’t the goal — feedback is
&lt;/h2&gt;

&lt;p&gt;Not every team should ship daily.&lt;/p&gt;

&lt;p&gt;Core infrastructure, regulated financial flows, heavy compliance — &lt;strong&gt;release cadence isn’t the only metric.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Anthropic’s practice is useful because it surfaces a sharper question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After AI made code fast, did your feedback loop get fast?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If not, you mostly get more half-done work, more PRs, more verification load, more integration risk.&lt;/p&gt;

&lt;p&gt;If yes, the shape of the team changes: clearer goals, thinner docs, looser handoffs, faster path to trunk, hidden-by-default features, evidence that accumulates automatically, rollback that isn’t improvised.&lt;/p&gt;

&lt;p&gt;Then AI becomes &lt;strong&gt;organizational R&amp;amp;D capacity&lt;/strong&gt;, not a typing sidecar.&lt;/p&gt;

&lt;p&gt;When I gauge maturity, I don’t count agents, LOC, or open PRs.&lt;/p&gt;

&lt;p&gt;I ask three questions at end of day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What actually reached production today?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why was that safe?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If we were wrong, how fast can we revert?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Teams that answer those three consistently are the ones that can talk about shipping every day without gambling.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means if you ship database changes
&lt;/h2&gt;

&lt;p&gt;Application delivery is only half the picture when agents touch &lt;strong&gt;schema, tenants, replication, or ops runbooks&lt;/strong&gt;. The same rules apply: thin goals, explicit evidence, flags or staged rollout, and rollback you have practiced — not “the agent said the cluster is fine.”&lt;br&gt;
At &lt;strong&gt;OceanBase&lt;/strong&gt;, we see the same bottleneck shift in the open-source community: AI speeds up how people generate deploy scripts and config, but production still depends on verification, integration, and recovery — especially for distributed databases where a green unit test does not prove a safe cutover.&lt;br&gt;
If you are experimenting with agent-assisted database work, three places to start:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/oceanbase/oceanbase-skills" rel="noopener noreferrer"&gt;oceanbase-skills&lt;/a&gt; (including &lt;a href="https://github.com/oceanbase/oceanbase-skills/tree/master/skills/oceanbase-deploy" rel="noopener noreferrer"&gt;oceanbase-deploy&lt;/a&gt;) that wrap deployment, tenant management, and benchmarks in governed, repeatable flows — not one-off prompts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://en.oceanbase.com/docs" rel="noopener noreferrer"&gt;OceanBase documentation&lt;/a&gt; — canonical steps for install, upgrade, and operations so agents and humans share the same source of truth.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Contributions and feedback — open an issue or PR on &lt;a href="https://github.com/oceanbase/oceanbase" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; if you are hardening a Release Harness that includes the data plane; we are interested in what breaks when coding gets 10× faster but cluster validation does not.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Your move&lt;/strong&gt;: Pick one change you shipped (or almost shipped) in the last month. Ask whether evidence covered app + data + rollback — not just “tests passed.” If the data path was hand-waved, that is your downstream bottleneck.&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>ai</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Stop Editing JSON by Hand</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 24 May 2026 16:36:32 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/stop-editing-json-by-hand-pj4</link>
      <guid>https://dev.to/oceandata4ai/stop-editing-json-by-hand-pj4</guid>
      <description>&lt;p&gt;&lt;em&gt;How ClawMaster helps you set up OpenClaw, manage Skills, and use PowerMem — without living in config files&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69mtjcufmm9xlnyelo3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69mtjcufmm9xlnyelo3e.png" alt="ClawMaster" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TL;DR: &lt;a href="https://github.com/openmaster-ai/clawmaster" rel="noopener noreferrer"&gt;ClawMaster&lt;/a&gt; is an open-source OpenClaw companion with a setup wizard, channel and model management, cost observability, Skill hosting, and &lt;a href="https://github.com/oceanbase/powermem" rel="noopener noreferrer"&gt;PowerMem&lt;/a&gt;-backed memory. Install with two commands, then open &lt;a href="http://localhost:16223" rel="noopener noreferrer"&gt;http://localhost:16223&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Skills pile up
&lt;/h2&gt;

&lt;p&gt;Once Skills under an agent multiply, overlaps and clutter show up fast. A common pattern today is filesystem scanning: Skills live as Markdown on disk, and the runtime may walk through every SKILL.md when it needs one.&lt;/p&gt;

&lt;p&gt;As the library grows, a few frictions appear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Finding the right Skill in a large tree takes longer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Long SKILL.md files are easy to read incompletely; recall gets less stable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Version, dependencies, and “when do I use this?” are hard to track with folders and filenames alone.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Context windows limit how much you can load in one turn.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the backdrop for tools that make OpenClaw easier to run day to day — not only to install once.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ClawMaster is
&lt;/h2&gt;

&lt;p&gt;ClawMaster is an open-source project from openmaster-ai. It is described as an “OpenClaw companion for real life” — a path from install to daily use, not just getting config files right.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F852ave6xsizd6cqzu3km.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F852ave6xsizd6cqzu3km.png" alt="GUI" width="799" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You get an OpenClaw management console from install through configuration: a setup wizard so you can configure models and channels and see what each AI call costs — without memorizing commands or hand-editing JSON.&lt;/p&gt;

&lt;p&gt;It also supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;PaddleOCR document parsing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ERNIE (Wenxin) image generation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost observability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scheduled tasks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In-browser Skill refresh and hosting&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who it’s for
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcoljb9u5x3jgaw2pqyh2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcoljb9u5x3jgaw2pqyh2.png" alt="Intended audiences" width="799" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm i &lt;span class="nt"&gt;-g&lt;/span&gt; clawmaster
clawmaster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:16223" rel="noopener noreferrer"&gt;http://localhost:16223&lt;/a&gt;. The setup wizard walks you through the rest.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fultoro5dfihptcmaarax.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fultoro5dfihptcmaarax.png" alt="LLM provider" width="800" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After launch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Pick or create an OpenClaw profile.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Connect at least one model provider and set a default.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add channels, plugins, Skills, or MCP servers as needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable gateway or observability when you want runtime inspection.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  PowerMem and LLM Wiki
&lt;/h2&gt;

&lt;p&gt;ClawMaster integrates PowerMem — the memory engine open-sourced by the OceanBase team. Instead of a pile of Markdown files, memory becomes structured, queryable storage with a forgetting curve.&lt;/p&gt;

&lt;p&gt;ClawMaster also aligns with Andrej Karpathy’s &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener noreferrer"&gt;LLM Wiki&lt;/a&gt; idea: content goes in once, the knowledge base keeps compounding, and the agent can draft with views you have already captured — even when you do not paste a link again. ClawMaster tracks a related LLM Knowledge module on the &lt;a href="https://github.com/openmaster-ai/clawmaster/milestone/1" rel="noopener noreferrer"&gt;v0.4.0 milestone&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;npm i -g clawmaster &amp;amp;&amp;amp; clawmaster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finish the wizard at &lt;a href="http://localhost:16223" rel="noopener noreferrer"&gt;http://localhost:16223&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure one channel or model you actually use&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Install or refresh one Skill from the UI&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want guided exercises, see &lt;a href="https://github.com/openmaster-ai/clawmaster-workshop" rel="noopener noreferrer"&gt;clawmaster-workshop&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;ClawMaster: &lt;a href="https://github.com/openmaster-ai/clawmaster" rel="noopener noreferrer"&gt;https://github.com/openmaster-ai/clawmaster&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PowerMem: &lt;a href="https://github.com/oceanbase/powermem" rel="noopener noreferrer"&gt;https://github.com/oceanbase/powermem&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LLM Wiki (Karpathy): &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener noreferrer"&gt;https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on the OceanBase community write-up on ClawMaster and the public ClawMaster README (v0.3.1). Check the repo for the latest release and roadmap.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>openclaw</category>
      <category>automation</category>
    </item>
    <item>
      <title>When Your Database Agent Speaks Plain English</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Sun, 24 May 2026 13:25:30 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/when-your-database-agent-speaks-plain-english-4f9m</link>
      <guid>https://dev.to/oceandata4ai/when-your-database-agent-speaks-plain-english-4f9m</guid>
      <description>&lt;p&gt;&lt;em&gt;The OceanBase community just shipped agent Skills that turn natural language into OBD commands.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zg19jk86pwtz7fucz66.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zg19jk86pwtz7fucz66.png" alt="Photo by Enchanted Tools on Unsplash" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TL;DR: The OceanBase community just open-sourced oceanbase-skills —&lt;br&gt;
agent Skills that turn natural language into OBD commands. First release:&lt;br&gt;
oceanbase-deploy for cluster ops, benchmarks, and backup/restore.&lt;/p&gt;

&lt;p&gt;Someone on the OceanBase community team recently shipped something that made me pause: a repository called &lt;a href="https://github.com/oceanbase/oceanbase-skills" rel="noopener noreferrer"&gt;oceanbase-skills&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here is the pitch: instead of memorizing OBD flags and config files, you describe what you want in plain English and let a Skill handle the rest.&lt;/p&gt;

&lt;p&gt;The first release — &lt;a href="https://github.com/oceanbase/oceanbase-skills/tree/master/skills/oceanbase-deploy" rel="noopener noreferrer"&gt;oceanbase-deploy&lt;/a&gt; — is already usable. I ran a TPC-H benchmark with a one-line prompt. All 22 queries finished in under 10 seconds.&lt;/p&gt;

&lt;p&gt;Let me show you how it works.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;If you have operated OceanBase, you know OBD is powerful. You also know the CLI has a broad command surface: config files, required flags, switchover vs failover, benchmark paths that need precise parameters.&lt;/p&gt;

&lt;p&gt;Running a TPC-H benchmark the old way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Open docs to confirm syntax&lt;/span&gt;
&lt;span class="c"&gt;# 2. Remember --remote-tbl-dir is mandatory&lt;/span&gt;
&lt;span class="c"&gt;# 3. Create the directory&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /tmp/tpch
&lt;span class="c"&gt;# 4. Assemble the full command&lt;/span&gt;
obd &lt;span class="nb"&gt;test &lt;/span&gt;tpch ob-test &lt;span class="nt"&gt;--tenant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mysql_test &lt;span class="nt"&gt;--remote-tbl-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/tpch &lt;span class="nt"&gt;--scale-factor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="c"&gt;# 5. Run it and pray...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the Skill installed, you just say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Run TPC-H for tenant mysql_test on cluster ob-test.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent fills in --remote-tbl-dir, creates the path, and runs the job. In my test, all 22 TPC-H queries finished in under 10 seconds.&lt;/p&gt;

&lt;p&gt;Same pattern works for deploying seekdb (OceanBase’s AI-native hybrid search database) and driving mysqltest — describe the goal, not the subcommand tree.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdj5hc9yshy3pxkt0iihv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdj5hc9yshy3pxkt0iihv.png" alt="my requirements" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9hd193z2iu3rpir8kmw6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9hd193z2iu3rpir8kmw6.png" alt="seekdb deploy" width="799" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro5shkuqfj1kbytaeai8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro5shkuqfj1kbytaeai8.png" alt="Result" width="799" height="562"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What oceanbase-deploy covers
&lt;/h2&gt;

&lt;p&gt;Today’s scope:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Cluster deployment and lifecycle management&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tenant operations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Benchmarks (TPC-H and more)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Backup and restore workflows&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other routine OBD operations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No new APIs — just curated OBD workflows in Skill form, so agents follow documented flags instead of guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The roadmap: a Skill catalog, not a one-off
&lt;/h2&gt;

&lt;p&gt;The repository currently focuses on deployment and ops, but the README outlines what’s next:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;More skills are on the way. Planned areas include OceanBase kernel tuning, SQL diagnostics, migration, and more.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Contributions are welcome at &lt;a href="https://github.com/oceanbase/oceanbase-skills" rel="noopener noreferrer"&gt;github.com/oceanbase/oceanbase-skills&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Think of this as the start of an OceanBase Skill layer for agents: small, composable packages instead of one monolithic “MCP server” that tries to do everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The pattern here is incremental by design: ship a focused Skill, prove the value, then expand. Not a grand “one platform to rule them all” narrative — just one Skill, one workflow, one problem solved at a time.&lt;/p&gt;

&lt;p&gt;Try it: clone &lt;a href="https://github.com/oceanbase/oceanbase-skills" rel="noopener noreferrer"&gt;oceanbase-skills&lt;/a&gt;, install oceanbase-deploy in your agent, and replace one manual OBD command you run weekly with a one-line prompt.&lt;/p&gt;

&lt;p&gt;If it saves you a doc tab, star the repo and tell the team what Skill should ship next.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;oceanbase-skills: &lt;a href="https://github.com/oceanbase/oceanbase-skills" rel="noopener noreferrer"&gt;https://github.com/oceanbase/oceanbase-skills&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;oceanbase-deploy: &lt;a href="https://github.com/oceanbase/oceanbase-skills/tree/master/skills/oceanbase-deploy" rel="noopener noreferrer"&gt;https://github.com/oceanbase/oceanbase-skills/tree/master/skills/oceanbase-deploy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seekdb: &lt;a href="https://github.com/oceanbase/seekdb" rel="noopener noreferrer"&gt;https://github.com/oceanbase/seekdb&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building agent tooling for databases? What’s your biggest friction point with CLI-based ops? Drop a comment below.&lt;/p&gt;

&lt;p&gt;👏 Clap if this helped · 🔔 Follow for more database engineering deep dives&lt;/p&gt;

&lt;p&gt;Note: oceanbase-skills and seekdb are OceanBase community projects. The author contributes to the broader OceanBase ecosystem.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>devops</category>
      <category>opensource</category>
      <category>nlp</category>
    </item>
    <item>
      <title>Your OpenClaw Bill Is Bleeding Tokens. Here’s What We Measured — and How to Fix It.</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Thu, 14 May 2026 16:33:57 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/your-openclaw-bill-is-bleeding-tokens-heres-what-we-measured-and-how-to-fix-it-947</link>
      <guid>https://dev.to/oceandata4ai/your-openclaw-bill-is-bleeding-tokens-heres-what-we-measured-and-how-to-fix-it-947</guid>
      <description>&lt;p&gt;&lt;em&gt;Memory bloat, compaction loss, and a retrieval-first path: ~32% less token spend on the AppWorld dev split — without dumbing the agent down.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhll8h3dme3fhnxyt1t0m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhll8h3dme3fhnxyt1t0m.png" alt="Photo by micheile henderson on Unsplash" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Developers who actually ship with LLMs know one truth by heart: the context window is not free. Every extra thousand tokens nudges the invoice up and the latency out.&lt;/p&gt;

&lt;p&gt;If you run OpenClaw (an agent stack that leans hard on long-horizon sessions), that anxiety gets concrete fast. Picture this: last week you spent two hours with your agent debugging production — logs, configs, experiments — and burned through 30k tokens of back-and-forth. This week you pick up where you left off, and the agent answers: Hi! Which refactor are we talking about?&lt;/p&gt;

&lt;p&gt;So you spend a few thousand tokens re-explaining context. The model spends a few thousand more re-understanding. And you still might not land the same mental model you had last Tuesday.&lt;/p&gt;

&lt;p&gt;Those 30k tokens? Mostly gone.&lt;/p&gt;

&lt;p&gt;That is not a one-off glitch. OpenClaw’s default memory story quietly feeds two token black holes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two black holes that blow up your token budget
&lt;/h2&gt;

&lt;h2&gt;
  
  
  1) The more you remember, the more you pay
&lt;/h2&gt;

&lt;p&gt;OpenClaw’s agent writes important state into MEMORY.md, and that file gets fully injected into the system prompt on every request. The longer you use the setup, the larger MEMORY.md grows—and every API call pays for the whole thing as input tokens.&lt;/p&gt;

&lt;p&gt;Bootstrap caps exist (for example, a 20k-character default per file, 150k total), but long before you hit the ceiling, a bloated prompt starts crowding the model’s working space. OpenClaw’s agent knows information can get lost — so it writes even more aggressively into MEMORY.md, which accelerates the bloat.&lt;/p&gt;

&lt;h2&gt;
  
  
  2) The more you forget, the more you burn tokens fixing mistakes
&lt;/h2&gt;

&lt;p&gt;When sessions get long, OpenClaw leans on two mechanisms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Compaction: OpenClaw asks an LLM to summarize older conversation chunks to free context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Memory flush: before compaction, OpenClaw spins up an embedded agent to decide what to persist into memory/YYYY-MM-DD.md.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But compaction is lossy compression by design, and OpenClaw’s retrieval-side slicing hard-cuts along line and character budgets (by default, 400 tokens per chunk) without respecting semantic boundaries. Important context can get cut mid-thought, recall quality drops, your agent makes mistakes, you rework, rework creates more chat, and you trigger compaction again sooner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool calls are an accelerant
&lt;/h2&gt;

&lt;p&gt;Tool outputs — web_fetch pages, exec dumps—can be huge per message—up to 400k characters per tool result in the worst case. That fills sessions fast. Those intermediates usually should not land in MEMORY.md, but they can still contain value you do not want to discard. Either way, tool-heavy runs tighten the doom loop.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The uncomfortable tradeoff: remembering everything gets expensive; forgetting costs correctness. You need a third path.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  A third path: cloud memory that steers tokens instead of hoarding them
&lt;/h2&gt;

&lt;p&gt;seekdb M0 is a cloud memory plugin for OpenClaw. The idea in one sentence:&lt;/p&gt;

&lt;p&gt;Do not dump all memory into the system prompt. Before each turn, retrieve only the memory slices that match the current topic — and inject just those.&lt;/p&gt;

&lt;p&gt;Unlike loading the full MEMORY.md on every request, M0 stores memory as discrete facts in a cloud database, with vector embeddings and full-text indexes. At conversation start, M0 runs hybrid retrieval (BM25 keyword scoring + vector similarity) and injects the top relevant facts. After each chat, M0 extracts new facts from the dialogue, compares them to what already exists, and decides whether to add, update, or skip.&lt;/p&gt;

&lt;p&gt;What that buys you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MEMORY.md stops ballooning—durable memory lives outside the always-on system prompt, so input tokens drop.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Session resets stop being catastrophic — memory persists and rehydrates without you paying again to restate context you already gave.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cross-device continuity — your memory is not trapped on one laptop.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most users, this is meant to feel invisible: you talk; M0 manages memory in the background.&lt;/p&gt;

&lt;p&gt;OpenClaw’s native persistence tends to route through compaction over the full session (including tool outputs) and a flush agent that decides what to write — both are comparatively heavy and lossy. M0 splits what to store from how to store it into two phases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1: fact extraction
&lt;/h2&gt;

&lt;p&gt;After a conversation, M0 extracts facts from user ↔ assistant text only — not from tool-call intermediates — and uses an LLM to produce atomic facts.&lt;/p&gt;

&lt;p&gt;Example: The user is Alex, a database engineer based in Austin. becomes three independent facts.&lt;/p&gt;

&lt;p&gt;Hard rules we enforce during extraction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Preserve time information (do not collapse went to Hawaii last year into a timeless went to Hawaii).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keep the original language (no automatic translation during extraction).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Do not extract sensitive information.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 2: memory decisions
&lt;/h2&gt;

&lt;p&gt;M0 does not blindly insert facts. M0 retrieves similar existing memories, then asks an LLM whether the new fact should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;ADD&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;UPDATE&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DELETE (contradictions)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NONE (already covered)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, M0 treats DELETE conservatively as NONE for auto-capture — M0 only adds and updates existing memories and does not proactively delete them, to reduce accidental erasure.&lt;/p&gt;

&lt;p&gt;Example decisions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;New fact: "Went to Hawaii last May."
Existing memory: "Has been to Hawaii."
→ UPDATE (time detail added)

New fact: "Doesn't like pizza anymore."
Existing memory: "Likes pizza."
→ UPDATE (preference changed)

New fact: "Is a database engineer."
Existing memories: "Name is Alex" + "Is a database engineer."
→ NONE (already covered)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implementation detail worth noting: in the memory-decision LLM call, each existing memory’s original ID is replaced with a short temporary index (0, 1, 2, …) so the decision model is less likely to hallucinate or garble long integer IDs. If the decision model returns an index that cannot be mapped back, M0 gracefully falls back to treating the fact as new.&lt;/p&gt;

&lt;p&gt;Why this matters for tokens: M0’s fact-extraction stage ignores tool transcripts, so you avoid paying an LLM to read 400k-character blobs just to mint memories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool-result compression: deterministic, zero LLM spend
&lt;/h2&gt;

&lt;p&gt;M0 also attacks session inflation at persistence time. When OpenClaw persists tool results to session history, M0’s tool_result_persist hook replaces raw output with a structured summary—rule-based, no LLM tokens.&lt;/p&gt;

&lt;p&gt;Illustrative shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Raw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;curl returned a 3,000-line JSON payload&lt;/span&gt;

&lt;span class="na"&gt;Compressed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web_fetch&lt;/span&gt;
  &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3,000 lines / 48K characters&lt;/span&gt;
  &lt;span class="na"&gt;preview&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users"&lt;/span&gt;&lt;span class="pi"&gt;:[{&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id"&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name"&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice"&lt;/span&gt;&lt;span class="nv"&gt;... (300 chars)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;M0’s summaries are not about perfect fidelity. They aim for high compression while preserving what happened, whether the tool succeeded, and a short preview.&lt;/p&gt;

&lt;p&gt;Compared with OpenClaw’s native compaction, which feeds the entire session (including tool dumps) into a summarizer, M0’s hook-based compression is closer to upstream budgeting: you control what enters the LLM pipeline, instead of waiting until you overflow and then compressing reactively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Experience + Skill: spend tokens on the right kind of reuse
&lt;/h2&gt;

&lt;p&gt;M0’s memory layer answers who this user is and what they care about. Another common waste pattern in agent stacks is different:&lt;/p&gt;

&lt;p&gt;Your OpenClaw agent may have skills, but not durable, reusable know-how distilled from real runs — so every similar task becomes another expensive exploration loop.&lt;/p&gt;

&lt;p&gt;M0 splits playbooks into two layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Experience (strategy layer): a tight summary of approach + key cautions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Skill (operations layer): structured steps, prerequisites, and pitfalls.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The two layers link by reference: your OpenClaw agent can pull strategy first, then expand operational detail only when needed — which helps keep the active prompt compact.&lt;/p&gt;

&lt;p&gt;Under the hood, M0 stores these in OceanBase (a distributed SQL database) with separate tables for Experience and Skill, indexing title and description with both vector and full-text indexes. Retrieval runs four parallel signals — title vector, description vector, title full-text, description full-text — then merges with RRF (Reciprocal Rank Fusion).&lt;/p&gt;

&lt;p&gt;Why four channels? In M0’s retrieval stack, title matching helps lock onto the right name, description matching helps lock onto the right content, vectors help with semantic equivalence (for example, build a playlist vs create a playlist), and full-text tends to win on exact strings like API names and error codes. That complementary mix is meant to make retrieval both accurate and broad: your OpenClaw agent should not need ten mid-confidence hits (think ~0.6 relevance) just to be safe, when three high-confidence items (~0.9) are enough to execute — and that gap maps straight to fewer tokens in the prompt.&lt;/p&gt;

&lt;p&gt;M0 also stages knowledge ingestion: M0’s pipeline detects a procedure in traces → structures a Skill (steps / pitfalls / prerequisites) → dedupes (for example, vector similarity &amp;gt; 0.75 merges) → runs moderation → stores. When M0 extracts Experience records, M0’s extractor can see stored skills and reference skill IDs, which keeps links generated rather than hand-maintained.&lt;/p&gt;

&lt;h2&gt;
  
  
  AppWorld numbers: how much did we actually save?
&lt;/h2&gt;

&lt;p&gt;Early on, we used LoCoMo to probe memory behavior, but found it skews toward chit-chat agents rather than work agents like OpenClaw — where evaluation is harder (skills, multi-step reasoning, structured API payloads).&lt;/p&gt;

&lt;p&gt;For a fairer workload, we switched to the AppWorld benchmark — a suite of 750 autonomous agent tasks framed as realistic, stateful challenges. In short, AppWorld’s evaluation is built around state-based unit tests: an agent can complete tasks in different ways, and AppWorld’s harness still checks for unintended harm during the run.&lt;/p&gt;

&lt;p&gt;The AppWorld benchmark paper (ACL 2024 resource paper, &lt;a href="https://arxiv.org/abs/2407.18901" rel="noopener noreferrer"&gt;arXiv:2407.18901&lt;/a&gt;) states in the abstract:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The state-of-the-art LLM, GPT4O, solves only ~49% of our ‘normal’ tasks and ~30% of ‘challenge’ tasks, while other models solve at least 16% fewer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;a href="https://appworld.dev/" rel="noopener noreferrer"&gt;AppWorld blog&lt;/a&gt; puts it plainly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Even the best LLM, GPT-4o, performs quite poorly. E.g., it completes only ~30% of the tasks in the challenge test set correctly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In our controlled setup on AppWorld dev (54 tasks, 15-step cap, no pre-loaded distilled skills), GPT-4o’s baseline was ~24% (13/54 solved) — below the headline pass rates quoted in AppWorld’s public materials, which reflect a different task mix and evaluation harness than this stripped-down run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Controlled comparison on AppWorld dev (54 tasks, 15-step cap)
&lt;/h2&gt;

&lt;p&gt;Our setup: we ran traces with Hermes + Qwen 3.6-plus (34/54 solved, 63%), kept all 54 trajectories, then distilled into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;M0 path: 85 experiences (with skill_refs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hermes path: 44 SKILL.md files&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then we evaluated GPT-4o on each distilled knowledge base. Only two knobs differ: distillation + storage/retrieval.&lt;/p&gt;

&lt;p&gt;Results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc5isn221utb1i9au8o2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc5isn221utb1i9au8o2g.png" alt="Comparison" width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: pp = percentage points (absolute change in pass rate, not relative % change).&lt;/p&gt;

&lt;p&gt;Headline takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;M0 net +8 tasks (examples mentioned: Spotify-style flows, cross-app tasks, Venmo-style flows), with some wins traded for losses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hermes net -1 on GPT-4o in this setup — no positive gain versus baseline.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why M0 beat file-skill matching in our analysis&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Retrieval precision: M0’s vector search can match the task description semantically; Hermes’ filename/tag matching does not understand semantics the same way, so Hermes misses paraphrases. Example (localized for a global audience): Create a Beyoncé playlist vs Bundle twenty Taylor Swift tracks together should route to the same underlying skill — M0’s vectors tolerate wording drift better than brittle naming.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Context hygiene: M0’s Experience records stay light (title-line scale); Hermes’ SKILL.md files can read like full manuals and crowd the model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On-demand expansion + dedupe: M0 uses skill_refs to load operational detail only when needed, and M0 performs semantic deduplication by pairing vector-similarity checks with an LLM merge so near-duplicate skills fold together instead of piling up. Hermes may inject all matching skills at once, and collisions among Hermes’ SKILL.md filenames can overwrite useful variants.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Efficiency (same GPT-4o runs as the table): average steps 9.5 → 6.2 (-35%), tokens 2.56M → 1.74M (-32%). Even failures become cheaper failures — less thrash, less exploration tax.&lt;/p&gt;

&lt;h2&gt;
  
  
  Teach once with a strong model, run forever with a cheaper one
&lt;/h2&gt;

&lt;p&gt;Rough cost sketch (our pricing assumptions — not a live vendor quote):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPT-5.4 one full pass: ~$57.6 at $22.5 / 1M tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GPT-4o baseline: 2.56M tokens → ~$25.6 at $10 / 1M&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GPT-4o + M0 distilled experience: 1.74M tokens → ~$17.4 at $10 / 1M&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Note (GPT-5.4 line, illustrative): Blended $/M on ~2.56M tokens in our draft; not a literal line item on OpenAI’s price list. Recompute from your own traces, then confirm current rates on the &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI API pricing page&lt;/a&gt; before you budget.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Our playbook: let GPT-5.4 or Claude Sonnet 4.6 solve the hard version once; M0 distills traces into Experience + Skill; then route repeat work to GPT-4o (or cheaper) with higher pass rates, fewer steps, and a smaller bill than the old naive rerun.&lt;/p&gt;

&lt;p&gt;The production takeaway is obvious: in a typical agent product, most requests are repetitive patterns. You do not need the most expensive model on every call — either let a strong model teach the task once, or have a human guide a weaker model through one clean run — and then later runs can finish on their own, grounded in distilled experience.&lt;/p&gt;

&lt;p&gt;Beyond one user’s workspace: once an Experience picks up enough positive feedback, M0 can publish it to a shared space where any other M0-connected agent can retrieve it — your solved mistakes stop being only yours. M0’s vector dedupe folds overlapping discoveries together, contributor metadata accrues, and that crowd knowledge is meant to grow out of distillation itself — not through a separate manual editorial pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  One-sentence install
&lt;/h2&gt;

&lt;p&gt;OpenClaw is built around the idea that the assistant should do the heavy lifting, not a human babysitting every step — and seekdb M0’s install path is written the same way: you send your OpenClaw assistant a single line, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read https://m0.seekdb.ai/SKILL.md and install and configure M0 per the instructions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, the agent is expected to check the installed OpenClaw version, obtain an Access Key, install the m0 plugin, apply the openclaw.json / gateway settings in one shot, and restart the gateway—without you clicking through a setup wizard.&lt;/p&gt;

&lt;p&gt;Humans can still sanity-check the service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# health check&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://m0.seekdb.ai/health

&lt;span class="c"&gt;# create a memory instance&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://m0.seekdb.ai/api/instances/ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"name": "my-memory"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned ak field is your Access Key for authenticated memory operations.&lt;/p&gt;

&lt;p&gt;Try it: wire up M0, then tell your OpenClaw agent a handful of real details about you — seekdb M0 will usually auto-extract about five or six facts, run them through the memory-decision step, and persist them in the cloud. On later chats it should pull your technical preferences back in instead of cold-starting the interview from zero.&lt;/p&gt;

&lt;p&gt;At that point it already knows who you are — so you should not have to spend tokens re-introducing yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;So why does OpenClaw token usage spike? Because the default memory path leans on MEMORY.md full-load plus reactive compaction and file-scattered recall. The prompt gets crowded; history gets summarized away; OpenClaw’s agent may not even know what to search for. You pay for remembering, you pay again for forgetting, and you pay a third time for re-discovery.&lt;/p&gt;

&lt;p&gt;M0’s bet is simpler to state than it is to build:&lt;/p&gt;

&lt;p&gt;Free memory from the always-on context — store independently, retrieve on relevance, persist across sessions.&lt;/p&gt;

&lt;p&gt;More crucially: distill execution into reusable Experience + Skill, then retrieve sharply — M0-style high-precision recall beats padding the prompt with maybe relevant bulk.&lt;/p&gt;

&lt;p&gt;Our AppWorld comparison is the punchline: same model, same tasks, swap the knowledge system, and you move from 2.56M → 1.74M tokens while pass rate climbs ~15 pp in our reported setup.&lt;/p&gt;

&lt;p&gt;Spend tokens on thinking — not on re-learning what you already solved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;OpenClaw: &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;https://openclaw.ai/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seekdb M0: &lt;a href="https://m0.seekdb.ai/" rel="noopener noreferrer"&gt;https://m0.seekdb.ai/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PowerMem (open source): &lt;a href="https://github.com/oceanbase/powermem" rel="noopener noreferrer"&gt;https://github.com/oceanbase/powermem&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AppWorld: &lt;a href="https://appworld.dev/" rel="noopener noreferrer"&gt;https://appworld.dev/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seekdb D0: &lt;a href="https://d0.seekdb.ai/" rel="noopener noreferrer"&gt;https://d0.seekdb.ai/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Existing M0 users: this upgrade applies automatically — Experience and Skill records accumulate in M0 during normal agent use, with no extra configuration.&lt;/p&gt;

&lt;p&gt;New users: send the one-liner install prompt to your OpenClaw agent and let it walk the setup.&lt;/p&gt;

&lt;p&gt;The first time you pay tuition on a mistake, you should not have to pay full tuition again.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openclawchallenge</category>
      <category>vectordatabase</category>
      <category>rag</category>
    </item>
    <item>
      <title>DeepMind’s CEO Says AGI May Be ~4 Years Away. The Last Three Missing Pieces Are Not What Most People Think.</title>
      <dc:creator>Charles Wu</dc:creator>
      <pubDate>Wed, 13 May 2026 02:53:18 +0000</pubDate>
      <link>https://dev.to/oceandata4ai/deepminds-ceo-says-agi-may-be-4-years-away-the-last-three-missing-pieces-are-not-what-most-19dl</link>
      <guid>https://dev.to/oceandata4ai/deepminds-ceo-says-agi-may-be-4-years-away-the-last-three-missing-pieces-are-not-what-most-19dl</guid>
      <description>&lt;p&gt;&lt;em&gt;Three gaps — continual learning, long reasoning, memory — and why they decide whether agents ship safely.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj2aym6wafem4z4hebod7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj2aym6wafem4z4hebod7.png" alt="Photo by Tanja Tepavac on Unsplash" width="800" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prologue
&lt;/h2&gt;

&lt;p&gt;A few days ago (April 29), Demis Hassabis — CEO of Google DeepMind and 2024 Nobel laureate in Chemistry — appeared on the podcast &lt;a href="https://www.youtube.com/watch?v=JNyuX1zoOgU" rel="noopener noreferrer"&gt;Agents, AGI &amp;amp; The Next Big Scientific Breakthrough&lt;/a&gt;. He predicted that AGI (artificial general intelligence) could arrive around 2030, and outlined several critical weaknesses in today’s AI.&lt;/p&gt;

&lt;p&gt;Hassabis spent much of the time on one question: What is today’s AI still missing?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Continual learning: unlike humans, it cannot keep learning for life and constantly renew what it knows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Long-term reasoning: very weak on long logic chains and multi-step planning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real memory: not just a context window, but structured, indexable long-term memory.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hassabis describes today’s models as exhibiting “jagged intelligence” — he contrasts solving IMO-level problems with still making elementary mistakes when a question is rephrased: strong peaks next to brittle failures.&lt;/p&gt;

&lt;p&gt;The interview lists continual learning, long-term reasoning, and aspects of memory as gaps that AGI must solve; Hassabis spends much of the memory segment arguing that scaling the context window alone does not fix durable recall. This article’s reading is that continual learning and long-horizon reliability are much harder to ship without a selective, retrievable memory layer — that is an interpretive link, not a single verbatim sentence from Hassabis ordering the three problems.&lt;/p&gt;

&lt;p&gt;What does that mean in products? A model can look brilliant on a contest task yet still fail “easy” follow-ups if it cannot persistently remember past conversations and user preferences.&lt;/p&gt;

&lt;p&gt;Next, I’ll walk through these core points from the interview.&lt;/p&gt;

&lt;h2&gt;
  
  
  A brute-force context window ≠ AI memory
&lt;/h2&gt;

&lt;p&gt;Everyone has noticed the race lately: who has the longer context window.&lt;/p&gt;

&lt;p&gt;From 4K to 128K, to 1 million tokens, to 10 million. It’s as if a long enough context could cram every problem to death.&lt;/p&gt;

&lt;p&gt;Hassabis makes the point that context window size alone doesn’t equal memory. Doing the math on today’s limits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;1M tokens ≈ ~20 minutes of video&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;10M tokens ≈ ~200 minutes total (~3 hours)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For an AI assistant that needs to understand your habits across days, weeks, months, even years of life and work — what is 200 minutes?&lt;/p&gt;

&lt;p&gt;And the issue isn’t only capacity. More importantly — today’s approach is to shove everything into the context window, important or not, wrong or stale. Each conversation is stateless in essence.&lt;/p&gt;

&lt;p&gt;Close the window, and what you talked about last round is gone.&lt;/p&gt;

&lt;p&gt;A context window is really working memory in the human brain.&lt;/p&gt;

&lt;p&gt;How much fits in human working memory? Psychology’s classic number is about seven items. Ask someone to memorize a friend’s phone number — they can usually hold about seven digits before things “overflow.”&lt;/p&gt;

&lt;p&gt;Large models? They’re already at 1 million tokens. By that logic, the model’s working memory is hundreds of thousands of times larger than a human’s — it should be hundreds of thousands of times smarter.&lt;/p&gt;

&lt;p&gt;Clearly, it isn’t.&lt;/p&gt;

&lt;h2&gt;
  
  
  The nature of memory: hippocampus &amp;amp; continual learning
&lt;/h2&gt;

&lt;p&gt;Hassabis contrasts AI with the human brain — his PhD was on how the hippocampus elegantly folds new knowledge into an existing knowledge system.&lt;/p&gt;

&lt;p&gt;That’s exactly where the problem lies. AI habitually stuffs everything into the context window: unimportant things, wrong things, outdated things. It looks like a lot of information; in practice it’s a mess.&lt;/p&gt;

&lt;p&gt;So why is human working memory — seven digits — enough?&lt;/p&gt;

&lt;p&gt;Because another system sits behind it. We remember years ago, childhood, a few hours ago. None of that lives in working memory; it’s another system — the hippocampus we just mentioned, the part of the brain that integrates new knowledge into the long-term store.&lt;/p&gt;

&lt;p&gt;Hassabis explains on the podcast that during REM sleep, the brain replays the day’s experiences, decides what to remember and what to forget, and integrates valuable experience into long-term memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxb0787yl5u8aig69h2w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxb0787yl5u8aig69h2w.png" alt="REM sleep" width="799" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;DeepMind’s famous DQN in 2013 — the first deep RL system to reach human-level play on Atari — borrowed a key idea from this: experience replay, replaying successful trajectories to learn.&lt;/p&gt;

&lt;p&gt;In AI years, that’s ancient history. The process of folding the new into the old knowledge base is what we call continual learning.&lt;/p&gt;

&lt;p&gt;In 2026, AI still broadly hasn’t gotten there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What should an “AI hippocampus” look like?
&lt;/h2&gt;

&lt;p&gt;Hassabis is clear: AI needs a standalone, efficiently indexable memory module — one that can actively choose what to remember and what to forget. That is a precondition for AI agents to run autonomously and reliably over long horizons.&lt;/p&gt;

&lt;p&gt;In other words, the context window is only a desk that keeps getting bigger. What AI really lacks is a hippocampus.&lt;/p&gt;

&lt;h2&gt;
  
  
  PowerMem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/oceanbase/powermem" rel="noopener noreferrer"&gt;PowerMem&lt;/a&gt;, an open-source project I work on, adds that “hippocampus” for AI agents — a persistent, continually learning memory system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhlh7no377sbzwm1i9zxe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhlh7no377sbzwm1i9zxe.png" alt="PowerMem" width="799" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It aligns closely with Hassabis’s direction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Instead of dumping whole conversations into context, it extracts key facts and tiers working, short-term, and long-term memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It uses an Ebbinghaus forgetting-curve mechanism — used memories strengthen; unused memories fade and may be pruned.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It supports hybrid retrieval: vector + full-text + graph; multiple agents can isolate or share memory.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The numbers are stark. On the long-dialogue memory benchmark &lt;a href="https://github.com/snap-research/locomo" rel="noopener noreferrer"&gt;LOCOMO&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feiden5faj1elceojlmz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feiden5faj1elceojlmz7.png" alt="Comparison" width="800" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the same tasks, PowerMem uses 18% of the tokens of the full-context approach (82% less) — yet scores higher, because not every old line of dialogue is worth keeping.&lt;/p&gt;

&lt;p&gt;Besides PowerMem, another project I’m involved in, &lt;a href="https://m0.seekdb.ai/" rel="noopener noreferrer"&gt;seekdb M0&lt;/a&gt;, is evolving cloud memory built for AI agents: plug in fast, share experience, self-learn and evolve.&lt;/p&gt;

&lt;p&gt;Of course, neither PowerMem nor seekdb M0 may reach the ultimate memory system Hassabis describes — the human brain replaying and integrating experience in sleep. But the direction is right: memory should not be propped up only by brute-force context windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model distillation — however strong the big model is, your phone catches up in six months
&lt;/h2&gt;

&lt;p&gt;Another point I kept rewinding to is distillation.&lt;/p&gt;

&lt;p&gt;Host Garry Tan asks what many people wonder: how smart can small models get? Is there a theoretical limit to distillation?&lt;/p&gt;

&lt;p&gt;Hassabis answers plainly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I don’t believe we’ve yet hit any fundamental information-theoretic limit — nor does anyone know if such a ceiling exists. Perhaps someday we’ll encounter an information-density ceiling — but for now, our assumption is that within six months to a year of a cutting-edge Pro model’s release, its capabilities can be compressed into models small enough to run on edge devices.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;He gives numbers: a distilled small model can reach 90–95% of the frontier model’s capability at about one-tenth the cost.&lt;/p&gt;

&lt;p&gt;That isn’t far future — it’s happening. DeepMind’s own product line follows that logic: Gemini Pro (frontier flagship) → Flash (distilled consumer inference) → Nano (on-device). Open Gemma 4 hit 40 million downloads in two and a half weeks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Small models serve multiple purposes. First, lower cost — and speed brings additional benefits. In coding or similar tasks, faster iteration accelerates progress, especially when collaborating with systems. A rapid system — even if only 90–95% as capable as the frontier — often delivers more net value due to dramatically improved iteration speed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hassabis also stresses edge settings: in-car, wearables, embodied robots — these need efficiency, privacy, and security, not just raw power.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For a home robot, you’d want a locally-run, efficient, yet powerful model — delegating specific tasks to cloud-based large models only when necessary. Audio and video streams processed locally, data retained locally — I envision this as an ideal end state.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That makes me think of a trend in motion: as large-model capability flows to the edge on a 6–12 month rhythm, an obvious question is — on the edge, what provides the data substrate for these small models?&lt;/p&gt;

&lt;p&gt;You need a full traditional database instance on the device, plus vector search, full-text search, and structured queries.&lt;/p&gt;

&lt;p&gt;That’s what another project I work on — &lt;a href="https://github.com/oceanbase/seekdb" rel="noopener noreferrer"&gt;seekdb&lt;/a&gt; — is aimed at.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Server mode needs only 1C2G, supports &lt;code&gt;pip install&lt;/code&gt; one-shot install and starts in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Embedded mode can ship as a Python / JS / TS dynamic library inside the app — no separate DB process, almost no overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It packs vector search, full-text, JSON, GIS in one engine, MySQL-compatible, low learning curve.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hassabis’s read makes you believe: edge intelligence isn’t “someday” — it’s closing in on a ~6-month cadence. Infrastructure that delivers full AI data capability at tiny cost will soon go from “nice-to-have” to “must-have.”&lt;/p&gt;

&lt;h2&gt;
  
  
  AI safety only in the prompt is not enough
&lt;/h2&gt;

&lt;p&gt;Hassabis ties powerful models to misuse risk — for example, after Garry Tan calls the moment “Promethean,” Hassabis answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Exactly. And — as the Prometheus myth warns — we must handle this power with great care: how it’s used, where it’s applied, and the risks of misuse.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;He also stresses privacy and security as a reason to run capable models on edge devices (see the home-robot quote in the distillation section above).&lt;/p&gt;

&lt;p&gt;Author’s framing (not a verbatim Hassabis checklist in the transcript): teams shipping agents still worry about two classes of failure: (1) bad actors using AI to scale attacks, and (2) more autonomy making “oops, it touched prod” incidents more consequential. The second is why stories like agents deleting data are no longer pure thought experiments — for a concrete write-up, see &lt;a href="https://dev.to/seekdb/nine-seconds-no-backups-an-agents-confession-k11"&gt;Nine Seconds, No Backups: An Agent’s “Confession”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;My view: as capabilities accelerate, guardrails cannot live only in the prompt — part of the responsibility belongs in infrastructure that limits blast radius.&lt;/p&gt;

&lt;p&gt;At the database layer, for example, you can design multiple lines of defense for agent-heavy systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Data branch / fork (like Git): agents experiment on a fork; primary DB/tables don’t move. Merge if good; throw away if bad.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recycle bin + flashback: dropped tables sit in recycle bin; FLASHBACK brings them back. Flashback query can read snapshots at arbitrary past times.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Primary/standby physical isolation: backups run on separate storage from the primary — not the same blast radius.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdh2ybso6pmyai284tv7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdh2ybso6pmyai284tv7.png" alt="Database adaptation" width="799" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bottom line: assume agents will make destructive mistakes sometimes — then weld shut those paths at the storage layer, not only in system prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI is still waiting for its “Einstein”
&lt;/h2&gt;

&lt;p&gt;Near the end, Hassabis offers what he calls the “Einstein Test”:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I sometimes call it the ‘Einstein Test’: Can you train a system using only knowledge available in 1901, then have it independently derive Einstein’s 1905 breakthroughs — including special relativity? Once achieved, these systems will be close to inventing genuinely novel concepts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today’s strongest systems can still look brilliant inside a fixed framework (including hard physics puzzles). Hassabis’s bar is higher: inventing the framework, not only acing questions within it.&lt;/p&gt;

&lt;p&gt;On AlphaGo and inventing Go, Hassabis continues:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But Move 37 alone wasn’t enough. It was cool and useful — but can this system invent Go itself? If you give it a high-level description — e.g., ‘a game whose rules take five minutes to learn but a lifetime to master; aesthetically elegant; playable in an afternoon’ — and it returns Go as the answer? Today’s systems cannot do this. Why not?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AlphaGo could play a shocking Move 37; it couldn’t invent Go. That’s today’s AI in one line: full marks on the exam, still hasn’t learned to write the exam.&lt;/p&gt;

&lt;p&gt;Hassabis says the field is still waiting for an Einstein-level breakthrough. Until then, what we can do is: build memory, roll out the edge, and shore up safety — so AI trips less on the road to AGI.&lt;/p&gt;

&lt;p&gt;Doing those three takes more than the model layer. Infrastructure has to evolve too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Video: Agents, AGI &amp;amp; The Next Big Scientific Breakthrough — &lt;a href="https://www.youtube.com/watch?v=JNyuX1zoOgU" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TechFlow compilation (quotes): &lt;a href="https://www.techflowpost.com/en-US/article/31409" rel="noopener noreferrer"&gt;https://www.techflowpost.com/en-US/article/31409&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PowerMem: &lt;a href="https://github.com/oceanbase/powermem" rel="noopener noreferrer"&gt;https://github.com/oceanbase/powermem&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LOCOMO: &lt;a href="https://github.com/snap-research/locomo" rel="noopener noreferrer"&gt;https://github.com/snap-research/locomo&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seekdb M0: &lt;a href="https://m0.seekdb.ai" rel="noopener noreferrer"&gt;https://m0.seekdb.ai&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seekdb: &lt;a href="https://github.com/oceanbase/seekdb" rel="noopener noreferrer"&gt;https://github.com/oceanbase/seekdb&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Case study (agents + data loss): &lt;a href="https://medium.com/ob4ai/nine-seconds-no-backups-an-agents-confession-ec5c2959c95a" rel="noopener noreferrer"&gt;Nine Seconds, No Backups&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building agent memory systems? What patterns are you using for&lt;br&gt;
long-term recall? Drop your approach below.&lt;/p&gt;

&lt;p&gt;👏 Clap · 🔔 Follow for more Agent engineering deep dives&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
