<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shilpa Mitra</title>
    <description>The latest articles on DEV Community by Shilpa Mitra (@shilpamitra).</description>
    <link>https://dev.to/shilpamitra</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943639%2F687febe9-259f-4c1e-a2a0-798ce0d6cc2b.png</url>
      <title>DEV Community: Shilpa Mitra</title>
      <link>https://dev.to/shilpamitra</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shilpamitra"/>
    <language>en</language>
    <item>
      <title>How to Set Up Persistent Memory for Codex Using Obsidian (3 Approaches)</title>
      <dc:creator>Shilpa Mitra</dc:creator>
      <pubDate>Mon, 25 May 2026 18:04:46 +0000</pubDate>
      <link>https://dev.to/shilpamitra/how-to-set-up-persistent-memory-for-codex-using-obsidian-3-approaches-383l</link>
      <guid>https://dev.to/shilpamitra/how-to-set-up-persistent-memory-for-codex-using-obsidian-3-approaches-383l</guid>
      <description>&lt;p&gt;Codex has no long-term memory. Every session starts clean. You explain your project structure, your naming conventions, your testing preferences, the thing you decided last Tuesday about the API design. Then you close the terminal and do it all over again tomorrow.&lt;/p&gt;

&lt;p&gt;The fix is giving Codex a memory layer that persists between sessions. And the best place to store that memory is &lt;a href="https://obsidian.md" rel="noopener noreferrer"&gt;Obsidian&lt;/a&gt;, because it's just markdown files on disk. No proprietary database. No sync service you don't control. Every note is a plain text file you can read, edit, search, and version control yourself.&lt;/p&gt;

&lt;p&gt;I tested three different approaches to wiring Codex memory into Obsidian. Each one solves the problem differently, and the right pick depends on how much setup you want to deal with and how deep you want the integration to go.&lt;/p&gt;

&lt;p&gt;Here's every approach, what it actually does, and how to set it up from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  First: Understanding How Codex Memory Actually Works
&lt;/h2&gt;

&lt;p&gt;Before wiring anything to Obsidian, you need to understand the two memory layers Codex already has built in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: AGENTS.md (Static Instructions)
&lt;/h3&gt;

&lt;p&gt;This is a markdown file you place at the root of your repo. Codex reads it at the start of every session before doing any work. Think of it as a briefing document. You put your project conventions, testing commands, directory layout, and anything the agent needs to know every single time.&lt;/p&gt;

&lt;p&gt;AGENTS.md is checked into version control. It's shared across the team. It's the right place for rules that should always apply.&lt;/p&gt;

&lt;p&gt;Quick example of what goes in here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# AGENTS.md&lt;/span&gt;

&lt;span class="gu"&gt;## Project&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Next.js 14 app with TypeScript
&lt;span class="p"&gt;-&lt;/span&gt; Tailwind for styling, no CSS modules
&lt;span class="p"&gt;-&lt;/span&gt; All API routes live in /app/api/

&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Run &lt;span class="sb"&gt;`pnpm test`&lt;/span&gt; before committing
&lt;span class="p"&gt;-&lt;/span&gt; All new functions need at least one unit test
&lt;span class="p"&gt;-&lt;/span&gt; Test files go next to the source file, named &lt;span class="err"&gt;*&lt;/span&gt;.test.ts

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use kebab-case for file names
&lt;span class="p"&gt;-&lt;/span&gt; Commit messages follow Conventional Commits: feat(scope): description
&lt;span class="p"&gt;-&lt;/span&gt; Never modify files in /config/production/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Codex loads this automatically. No config needed. Just create the file.&lt;/p&gt;

&lt;p&gt;You can also run &lt;code&gt;/init&lt;/code&gt; inside a Codex session, and it will scaffold an AGENTS.md based on your project's detected tech stack, directory structure, and config files. Good starting point if you don't want to write it from scratch.&lt;/p&gt;

&lt;p&gt;One thing to watch: Codex concatenates AGENTS.md files from the repo root down to your current directory, and stops at 32 KiB combined size. If your instructions are being ignored, you might be hitting the size limit. You can verify by asking Codex: "Summarize the instructions you have loaded for this session."&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Native Memories (Auto-Generated)
&lt;/h3&gt;

&lt;p&gt;This is the newer system. When enabled, Codex automatically summarizes your sessions in the background and writes those summaries to &lt;code&gt;~/.codex/memories/&lt;/code&gt;. The next time you start a session, it reads those summaries back in. You don't paste anything. You don't reference anything. The context just shows up.&lt;/p&gt;

&lt;p&gt;The memory pipeline works in two phases. Phase 1 runs after a session has been idle long enough (six hours by default, configurable via &lt;code&gt;min_rollout_idle_hours&lt;/code&gt; from 1 to 48). It extracts key context from the conversation, redacts any secrets it finds, and stores a structured summary. Phase 2 periodically consolidates all those individual summaries into a unified memory file that gets injected into future sessions.&lt;/p&gt;

&lt;p&gt;The storage layout under &lt;code&gt;~/.codex/memories/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.codex/memories/
├── memory_summary.md      # High-level summary injected into every session
├── MEMORY.md              # Searchable registry of aggregated insights
├── raw_memories.md        # Temporary merge used during consolidation
├── rollout_summaries/     # Per-thread recaps with lessons learned
└── skills/                # Reusable procedures the agent discovered
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To enable it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.codex/config.toml&lt;/span&gt;
&lt;span class="nn"&gt;[features]&lt;/span&gt;
&lt;span class="py"&gt;memories&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or as a one-time CLI override: &lt;code&gt;codex -c features.memories=true&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Or in the Codex app: Settings &amp;gt; Memories &amp;gt; Enable.&lt;/p&gt;

&lt;p&gt;Once it's on, you can fine-tune the behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[memories]&lt;/span&gt;
&lt;span class="py"&gt;generate_memories&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;    &lt;span class="c"&gt;# Let new threads create memory entries&lt;/span&gt;
&lt;span class="py"&gt;use_memories&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;         &lt;span class="c"&gt;# Inject existing memories into new sessions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also run these independently. Want Codex to read old memories but not generate new ones? Set &lt;code&gt;generate_memories = false&lt;/code&gt; and &lt;code&gt;use_memories = true&lt;/code&gt;. Useful for debugging or when you want to freeze the memory state.&lt;/p&gt;

&lt;p&gt;Inside a running session, type &lt;code&gt;/memories&lt;/code&gt; to control whether that specific thread can use or generate memories. This doesn't touch your global settings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important caveat:&lt;/strong&gt; Native memories are off by default and currently unavailable in the EEA, UK, or Switzerland. Also, memories are per-user. If your team shares a Codex environment, individual memories don't pool across teammates. Team-wide context belongs in AGENTS.md.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Obsidian Comes In
&lt;/h3&gt;

&lt;p&gt;The two layers above work, but they have limits. AGENTS.md is static and manual. Native memories are auto-generated but opaque and not easily searchable. Obsidian gives you a visual, organized, searchable knowledge base that your agent can read from and write to. And because Obsidian is just a folder of markdown files, it plays nicely with every tool in the chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 1: Basic Memory + MCP (Easiest Setup, Cross-Tool Compatible)
&lt;/h2&gt;

&lt;p&gt;This is the fastest path to persistent Codex memory that syncs with Obsidian.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/basicmachines-co/basic-memory" rel="noopener noreferrer"&gt;Basic Memory&lt;/a&gt; is an MCP server that gives any AI tool (Codex, Claude Code, Cursor, Claude Desktop) persistent context through plain markdown files. You store notes in a folder. Basic Memory indexes them. Codex queries them through MCP. And because the storage format is just markdown, you point Obsidian at the same folder and everything shows up in your vault with full graph view, backlinks, and search.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in practice
&lt;/h3&gt;

&lt;p&gt;You're three weeks into building an API. You've made decisions about auth strategy, database schema, rate limiting approach, error handling patterns. All of that context lives in Basic Memory notes.&lt;/p&gt;

&lt;p&gt;You open a new Codex session and say: "What decisions have we made about the API design? Check my notes."&lt;/p&gt;

&lt;p&gt;Codex uses semantic search through MCP, finds the relevant notes across your project, and answers grounded in your actual project history. No re-explaining. No pasting old conversations.&lt;/p&gt;

&lt;p&gt;You switch to Claude Code for a different task on the same project. Same notes. Same context. Zero re-setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup (Local)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex mcp add basic-memory bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"uvx basic-memory mcp"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire install. One command. The &lt;code&gt;uvx&lt;/code&gt; approach handles dependency resolution automatically and runs Basic Memory as a child process.&lt;/p&gt;

&lt;p&gt;To scope it to a specific project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex mcp add basic-memory bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"uvx basic-memory mcp --project your-project-name"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify it's connected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex mcp list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see &lt;code&gt;basic-memory&lt;/code&gt; listed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup (Cloud, for remote access)
&lt;/h3&gt;

&lt;p&gt;If you want cloud-hosted memory:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1.&lt;/strong&gt; Create an API key at &lt;a href="https://app.basicmemory.com" rel="noopener noreferrer"&gt;app.basicmemory.com&lt;/a&gt; under Settings &amp;gt; API Keys&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.&lt;/strong&gt; Add it to your shell profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'export BASIC_MEMORY_API_KEY=your-key-here'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.zshrc
&lt;span class="nb"&gt;source&lt;/span&gt; ~/.zshrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3.&lt;/strong&gt; Add to your Codex config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.codex/config.toml&lt;/span&gt;
&lt;span class="nn"&gt;[mcp_servers.basic-memory]&lt;/span&gt;
&lt;span class="py"&gt;url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://cloud.basicmemory.com/mcp"&lt;/span&gt;
&lt;span class="py"&gt;bearer_token_env_var&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"BASIC_MEMORY_API_KEY"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Connecting Obsidian
&lt;/h3&gt;

&lt;p&gt;Open Obsidian. Create a new vault. Point it at your Basic Memory directory (&lt;code&gt;~/basic-memory&lt;/code&gt; by default, or your project folder). That's it. The same markdown files your AI writes show up in Obsidian with graph view, backlinks, and rich editing. No import or export step.&lt;/p&gt;

&lt;p&gt;Notes you create in Obsidian are immediately available to Codex. Notes Codex creates show up in Obsidian. Same files, two interfaces.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use this approach
&lt;/h3&gt;

&lt;p&gt;You want the fastest setup, you use multiple AI tools (not just Codex), and you want your memory notes to be plain markdown you can browse and edit in Obsidian.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 2: Structured Obsidian Vault with AGENTS.md + Codex Hooks (Deepest Integration)
&lt;/h2&gt;

&lt;p&gt;This is the power-user option. Instead of a third-party memory layer, you build a structured Obsidian vault that Codex reads directly through AGENTS.md and lifecycle hooks.&lt;/p&gt;

&lt;p&gt;The idea is simple: your Obsidian vault becomes your project's knowledge base. AGENTS.md tells Codex how the vault is organized, what the naming conventions are, and where to find things. &lt;a href="https://developers.openai.com/codex/hooks" rel="noopener noreferrer"&gt;Codex hooks&lt;/a&gt; automatically inject context from the vault at session start so you never have to re-explain what's going on.&lt;/p&gt;

&lt;p&gt;Where Basic Memory gives you a shared note store through MCP, this approach gives you full control with zero external dependencies. Everything stays in your vault, everything is plain markdown, and Codex reads it natively.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in practice
&lt;/h3&gt;

&lt;p&gt;You open the terminal in your vault directory and run Codex. The &lt;code&gt;SessionStart&lt;/code&gt; hook fires automatically, reads your vault's index file, and injects a summary of active projects, recent decisions, and open tasks into the session. Codex knows what's going on before you type a single word.&lt;/p&gt;

&lt;p&gt;You say: "What did we decide about the caching strategy last week?" Codex reads the decision records in your vault and pulls the answer from your own notes.&lt;/p&gt;

&lt;p&gt;During the day, every note you create gets filed with YAML frontmatter, tagged, and linked. Decision records, project notes, architecture docs. Codex follows the structure defined in AGENTS.md and files things consistently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The vault structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;projects/          # One folder per active project
decisions/         # Architecture and design decision records
memory/            # Persistent context Codex reads across sessions
memory/goals.md    # Current priorities and focus areas
memory/index.md    # Map of everything in the vault
templates/         # Note templates with YAML frontmatter
reference/         # Codebase knowledge, API docs, architecture maps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1: Create AGENTS.md at the vault root
&lt;/h3&gt;

&lt;p&gt;This is Codex's operating manual for your vault. Here's a practical example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# AGENTS.md&lt;/span&gt;

&lt;span class="gu"&gt;## Vault Structure&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; /projects/ contains one folder per active project
&lt;span class="p"&gt;-&lt;/span&gt; /decisions/ contains architecture decision records (ADR format)
&lt;span class="p"&gt;-&lt;/span&gt; /memory/goals.md has current priorities. Read this first every session.
&lt;span class="p"&gt;-&lt;/span&gt; /memory/index.md is the vault map. Scan it to know what exists.

&lt;span class="gu"&gt;## Note Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; All notes use YAML frontmatter with: title, date, status, tags
&lt;span class="p"&gt;-&lt;/span&gt; Status values: active, completed, archived, deprecated
&lt;span class="p"&gt;-&lt;/span&gt; File names use kebab-case: my-decision-about-caching.md
&lt;span class="p"&gt;-&lt;/span&gt; Link related notes using [[wikilinks]]

&lt;span class="gu"&gt;## When Creating Notes&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Decision records go in /decisions/ with ADR format
&lt;span class="p"&gt;-&lt;/span&gt; Project notes go in /projects/{project-name}/
&lt;span class="p"&gt;-&lt;/span&gt; Always update /memory/index.md when creating new notes

&lt;span class="gu"&gt;## When Starting a Session&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Read /memory/goals.md for current priorities
&lt;span class="p"&gt;-&lt;/span&gt; Check /memory/index.md for vault overview
&lt;span class="p"&gt;-&lt;/span&gt; Look at recent git commits to see what changed since last session
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Set up the SessionStart hook
&lt;/h3&gt;

&lt;p&gt;Codex hooks let you run scripts at specific lifecycle events. The &lt;code&gt;SessionStart&lt;/code&gt; event fires when a session begins and can inject context automatically.&lt;/p&gt;

&lt;p&gt;Hooks are experimental and currently disabled on Windows. You need to enable the feature flag first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.codex/config.toml&lt;/span&gt;
&lt;span class="nn"&gt;[features]&lt;/span&gt;
&lt;span class="py"&gt;codex_hooks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create &lt;code&gt;.codex/hooks.json&lt;/code&gt; in your vault:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"SessionStart"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"startup|resume"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cat memory/goals.md memory/index.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"timeout"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The structure is: event name as a key, then an array of matcher groups, each containing a &lt;code&gt;matcher&lt;/code&gt; regex and a &lt;code&gt;hooks&lt;/code&gt; array. For &lt;code&gt;SessionStart&lt;/code&gt;, the matcher filters on how the session started (&lt;code&gt;startup&lt;/code&gt; or &lt;code&gt;resume&lt;/code&gt;). The &lt;code&gt;timeout&lt;/code&gt; is in seconds (default is 600 if omitted). Any plain text the command writes to stdout gets injected as developer context into the session.&lt;/p&gt;

&lt;p&gt;This reads your goals and vault index, then injects them as context at the start of every Codex session. Codex sees this before you type a single word.&lt;/p&gt;

&lt;p&gt;You can make the hook smarter. A script that pulls recent git changes, scans for notes modified in the last 48 hours, and builds a compact briefing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .codex/session-start.sh&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"## Current Goals"&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;memory/goals.md
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"## Recently Modified Notes"&lt;/span&gt;
find &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.md"&lt;/span&gt; &lt;span class="nt"&gt;-mtime&lt;/span&gt; &lt;span class="nt"&gt;-2&lt;/span&gt; &lt;span class="nt"&gt;-not&lt;/span&gt; &lt;span class="nt"&gt;-path&lt;/span&gt; &lt;span class="s2"&gt;"./.codex/*"&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"## Recent Changes"&lt;/span&gt;
git log &lt;span class="nt"&gt;--oneline&lt;/span&gt; &lt;span class="nt"&gt;-10&lt;/span&gt; 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No git history"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then update the hook to point to the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"SessionStart"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"startup|resume"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .codex/session-start.sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"timeout"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"statusMessage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Loading vault context"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Open the vault in Obsidian
&lt;/h3&gt;

&lt;p&gt;Open the same folder as an Obsidian vault. You get graph view across all your notes, backlinks between decisions and projects, full-text search, and a visual interface for browsing everything Codex writes. Same files, two interfaces.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Run Codex from the vault directory
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~/your-vault &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Codex loads AGENTS.md, the SessionStart hook fires and injects your goals and index, and you're working with full context from the first prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  What sets this apart from the other approaches
&lt;/h3&gt;

&lt;p&gt;No external tools. No MCP servers. No API keys. Everything is AGENTS.md (instructions), hooks (automation), and markdown files (knowledge). The vault is fully portable. You can version control it with git, sync it however you want, and switch to a different agent later without rebuilding anything. A well-documented vault in markdown is not locked to any single AI tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use this approach
&lt;/h3&gt;

&lt;p&gt;You want Obsidian as the center of your workflow with zero external dependencies. You want to control exactly what Codex sees and how the vault is organized. You're comfortable writing an AGENTS.md and a simple hook script.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 3: Native Memories + Manual Obsidian Sync (Minimal Setup, Good Enough for Most)
&lt;/h2&gt;

&lt;p&gt;If you don't want to install anything extra, you can use Codex's built-in memory system and just point Obsidian at the memory folder.&lt;/p&gt;

&lt;p&gt;This is the simplest approach. Enable native memories, let Codex auto-generate summaries, and open &lt;code&gt;~/.codex/memories/&lt;/code&gt; as an Obsidian vault (or add it as a folder inside an existing vault). You get visual browsing and search over everything Codex remembers.&lt;/p&gt;

&lt;p&gt;The tradeoff: this is read-only from Obsidian's perspective. You can look at the memory files, but hand-editing them isn't the supported path. Codex treats &lt;code&gt;~/.codex/memories/&lt;/code&gt; as generated state that it manages itself. If you want to give Codex persistent instructions, put them in AGENTS.md instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1.&lt;/strong&gt; Enable memories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.codex/config.toml&lt;/span&gt;
&lt;span class="nn"&gt;[features]&lt;/span&gt;
&lt;span class="py"&gt;memories&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2.&lt;/strong&gt; Open &lt;code&gt;~/.codex/memories/&lt;/code&gt; as an Obsidian vault, or symlink it into an existing vault:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; ~/.codex/memories/ ~/your-vault/codex-memories
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3.&lt;/strong&gt; Work normally. After sessions go idle (six hours by default), Codex processes them in the background. Summaries appear in the folder. Obsidian picks them up automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in practice
&lt;/h3&gt;

&lt;p&gt;You've been using Codex on a project for two weeks. You open the codex-memories folder in Obsidian and see rollout summaries for each session, a consolidated memory summary, and any skills the agent discovered. You can search across all of them, see patterns in your workflow, and spot context that Codex is carrying forward.&lt;/p&gt;

&lt;p&gt;When you start a new Codex session, the agent reads &lt;code&gt;memory_summary.md&lt;/code&gt; (capped at 5,000 tokens to preserve context window) and has access to the rest of the memories folder if it needs deeper context.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use this approach
&lt;/h3&gt;

&lt;p&gt;You just want Codex to remember things between sessions and you want a way to browse what it remembers. You don't need cross-tool memory sharing or a structured vault system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which Approach Should You Pick?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with Approach 3&lt;/strong&gt; if you just want Codex to stop forgetting things. It takes 30 seconds to enable native memories and one more minute to point Obsidian at the folder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Move to Approach 1&lt;/strong&gt; when you start using multiple AI tools or want your notes to be the source of truth (not Codex's auto-generated summaries). Basic Memory gives you clean two-way sync between your vault and every MCP-compatible tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go with Approach 2&lt;/strong&gt; when you want full control with zero external dependencies. AGENTS.md plus a SessionStart hook gives you a self-contained vault that Codex reads natively. No MCP servers, no API keys, just markdown and a hook script.&lt;/p&gt;

&lt;p&gt;You can also combine them. I run native memories for auto-capture plus Basic Memory for structured project knowledge that I want searchable across tools. The native layer catches things I forget to document. Basic Memory holds the deliberate notes I want to persist long-term.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Gotchas
&lt;/h2&gt;

&lt;p&gt;A few things that tripped me up during setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AGENTS.md vs Memories:&lt;/strong&gt; Don't rely on memories for rules that must always apply. Memories are a personal recall layer. Put team-wide conventions and project rules in AGENTS.md where they're version controlled and shared.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory timing:&lt;/strong&gt; Codex doesn't generate memories immediately when you close a session. It waits until the thread has been idle long enough (six hours by default). Don't panic when the memories folder doesn't update right away.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Size limits:&lt;/strong&gt; The memory summary injected into each session is capped at 5,000 tokens. If you're building a massive knowledge base, not everything will make it into every session. The agent can still read deeper into the memories folder when it needs to, but the automatic injection has a ceiling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secret redaction:&lt;/strong&gt; Codex redacts secrets from generated memories, but review your memory files before sharing your &lt;code&gt;~/.codex&lt;/code&gt; directory or committing any memory artifacts. The redaction is good but not perfect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EEA/UK/Switzerland:&lt;/strong&gt; Native memories aren't available in these regions yet. Use Approach 1 or 2 instead.&lt;/p&gt;




&lt;p&gt;If you found the breakdown of Hermes and how it scales from one agent to a full team useful, the architecture behind that post pairs well with this one: &lt;a href="https://webafterai.substack.com" rel="noopener noreferrer"&gt;I Put an Autonomous AI Agent on My Laptop. It Saved Me 7 Hours in Week One.&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;We cover practical AI agent setups like this twice a week at &lt;a href="https://webafterai.substack.com" rel="noopener noreferrer"&gt;Web After AI&lt;/a&gt;. No hype, just what works and how to set it up.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>obsidian</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How Claude Code Achieves a 92% Cache Hit Rate: A Deep Dive Into Prompt Caching for AI Agents</title>
      <dc:creator>Shilpa Mitra</dc:creator>
      <pubDate>Sun, 24 May 2026 17:08:31 +0000</pubDate>
      <link>https://dev.to/shilpamitra/how-claude-code-achieves-a-92-cache-hit-rate-a-deep-dive-into-prompt-caching-for-ai-agents-1hca</link>
      <guid>https://dev.to/shilpamitra/how-claude-code-achieves-a-92-cache-hit-rate-a-deep-dive-into-prompt-caching-for-ai-agents-1hca</guid>
      <description>&lt;p&gt;If you're running AI agents in production, there's a cost you're probably not thinking about.&lt;/p&gt;

&lt;p&gt;Every turn in an agentic conversation sends the full prompt to the model. That includes the system instructions, all the tool definitions, any project context that was loaded earlier, and the entire conversation history. The model processes all of it. From the top. Every single time.&lt;/p&gt;

&lt;p&gt;For a quick two-turn interaction, this doesn't matter much. But for a 50-turn coding session where the system prompt alone is 20,000 tokens? That's 1 million tokens of repeated computation across the session, all billed at full input price, all producing zero new insight. The model already processed that system prompt 49 turns ago. It's just doing it again because nothing told it not to.&lt;/p&gt;

&lt;p&gt;This is the problem prompt caching solves. And Claude Code is probably the best case study of how to do it right.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Parts of Every Prompt
&lt;/h2&gt;

&lt;p&gt;The first thing to understand is that not all tokens in a prompt are created equal.&lt;/p&gt;

&lt;p&gt;Look at any agentic API call and you'll see two distinct layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The foundation.&lt;/strong&gt; This is everything that stays the same from turn to turn. System instructions, tool schemas, project-level context like a &lt;code&gt;CLAUDE.md&lt;/code&gt; file, behavioral rules. If you looked at turn 1 and turn 47 side by side, this part would be identical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The conversation.&lt;/strong&gt; This is everything that's different each turn. The user's latest message, tool call results, file contents that were just read, terminal output. This grows with every interaction and is genuinely new information the model needs to process.&lt;/p&gt;

&lt;p&gt;The entire trick behind prompt caching is recognizing that the foundation doesn't need to be reprocessed. You compute it once, store the result, and reuse it on every subsequent turn. The model only does fresh work on the conversation layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Actually Being Cached: The Transformer Angle
&lt;/h2&gt;

&lt;p&gt;This isn't just skipping a string comparison. To understand why caching cuts costs so dramatically, you need to know what the model does when it reads a prompt.&lt;/p&gt;

&lt;p&gt;LLM inference has two stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prefill&lt;/strong&gt;: the model takes your entire input and runs it through dense matrix multiplications, token by token, building an internal representation. This is computationally expensive and it's where most of the time and cost goes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decode&lt;/strong&gt;: the model generates its response one token at a time, mostly just reading from the state it already built during prefill.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;During prefill, the model computes three vectors for every token: &lt;strong&gt;Query&lt;/strong&gt;, &lt;strong&gt;Key&lt;/strong&gt;, and &lt;strong&gt;Value&lt;/strong&gt;. These are the building blocks of the attention mechanism, how the model figures out which parts of the input matter for which other parts.&lt;/p&gt;

&lt;p&gt;The important property: Key and Value vectors for any given token only depend on the tokens before it. They're deterministic. If the input is the same, the output is the same.&lt;/p&gt;

&lt;p&gt;So once you've computed the Key-Value pairs for a 20,000-token system prompt, you can store them. Next time a request comes in with that same prefix, you skip the entire prefill computation for those 20,000 tokens and go straight to processing the new content.&lt;/p&gt;

&lt;p&gt;Anthropic's infrastructure does this by hashing the input prefix. Same hash, same cached tensors, no recomputation. Different hash (even one byte different), full recomputation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics
&lt;/h2&gt;

&lt;p&gt;Here's where this gets concrete. Anthropic's caching pricing has three tiers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Multiplier&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cache reads&lt;/td&gt;
&lt;td&gt;0.1x base input price&lt;/td&gt;
&lt;td&gt;90% discount on every cached token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5-minute cache writes&lt;/td&gt;
&lt;td&gt;1.25x base input price&lt;/td&gt;
&lt;td&gt;Small premium to store the KV tensors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-hour cache writes&lt;/td&gt;
&lt;td&gt;2x base input price&lt;/td&gt;
&lt;td&gt;Extended TTL for longer sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For Claude Sonnet 4.6 (&lt;code&gt;$3/MTok&lt;/code&gt; base input), here's what that looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Standard input:     $3.00 / MTok
Cache read:         $0.30 / MTok   (90% savings)
5-min cache write:  $3.75 / MTok   (25% premium, one-time)
1-hour cache write: $6.00 / MTok   (2x premium, one-time)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A cache hit costs 10% of standard input. That means caching pays for itself after just one subsequent read for the 5-minute duration. For a 50-turn session reusing a 20,000-token prefix, the savings compound on every single turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracking a Real Claude Code Session
&lt;/h2&gt;

&lt;p&gt;Theory is nice. Let's trace the actual token economics of a single debugging session to see where the money goes.&lt;/p&gt;

&lt;p&gt;You open Claude Code in a Next.js project. The moment the session starts, it loads the system prompt, all available tool definitions (&lt;code&gt;file read&lt;/code&gt;, &lt;code&gt;file write&lt;/code&gt;, &lt;code&gt;bash&lt;/code&gt;, &lt;code&gt;grep&lt;/code&gt;, &lt;code&gt;glob&lt;/code&gt;, and others), and your project's &lt;code&gt;CLAUDE.md&lt;/code&gt;. That initial payload lands somewhere around 20,000 tokens. Every single one of those tokens is processed fresh. This is the only time you pay full price for them.&lt;/p&gt;

&lt;p&gt;You type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"There's a race condition in the checkout flow. Orders are occasionally duplicating when users double-click the submit button."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Code doesn't just start editing files. First, it spins up an &lt;strong&gt;Explore subagent&lt;/strong&gt; to understand the codebase. That subagent reads your API routes, checks your database schema, looks at your order processing logic, and examines the frontend form handler. All of those file reads and grep results get appended to the growing conversation as tool outputs.&lt;/p&gt;

&lt;p&gt;Here's the key: none of that new content touches the 20,000-token prefix. The system prompt, the tool definitions, the &lt;code&gt;CLAUDE.md&lt;/code&gt;, all of that is still sitting in cache from turn one. Every subsequent API call reads those 20,000 tokens at &lt;code&gt;$0.30/MTok&lt;/code&gt; instead of &lt;code&gt;$3.00/MTok&lt;/code&gt;. You're only paying full price for the new stuff: your message and the tool outputs.&lt;/p&gt;

&lt;p&gt;The Explore subagent finishes and hands its findings back to the main agent. But it doesn't dump 15,000 tokens of raw file contents into the conversation. It passes a &lt;strong&gt;condensed summary&lt;/strong&gt;: which files are relevant, what the current logic does, where the race condition likely lives. This is a deliberate design choice. Keeping the dynamic tail compact means the cache ratio stays high.&lt;/p&gt;

&lt;p&gt;Now the &lt;strong&gt;Plan subagent&lt;/strong&gt; kicks in. It takes the summary, reasons through the fix (idempotency key on the frontend, deduplication check on the API, database unique constraint as a safety net), and produces a step-by-step implementation plan. You approve it. Claude Code starts writing code.&lt;/p&gt;

&lt;p&gt;Over the next 15 minutes, you go back and forth. It writes the idempotency logic, you ask it to also handle the case where the page refreshes mid-checkout, it adjusts. Each of these turns adds new content to the dynamic tail. But the foundation, those 20,000 tokens, is read from cache every single time. Each cache hit also resets the TTL, so the cache never expires as long as you keep working.&lt;/p&gt;

&lt;p&gt;By the end of the session, you've gone through maybe 25 turns. The total tokens processed easily exceeds 1.5 million. But if you run &lt;code&gt;/cost&lt;/code&gt;, the bill tells a very different story than 1.5M tokens at full price. The vast majority were cache reads at a 90% discount.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's the difference between a $4.50 session and a $0.90 session. For one debugging task.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Production Numbers
&lt;/h2&gt;

&lt;p&gt;This isn't theoretical. Claude Code's production metrics:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;92%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost reduction&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;81%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;First-token latency reduction&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;79%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In active sessions, 95%+ of input tokens are typically cache hits, billed at 0.1x the base price. Out of 400K tokens in a session, maybe 20K to 40K are billed at full price.&lt;/p&gt;

&lt;p&gt;Without prompt caching, a long Opus coding session (100 turns with compaction cycles) can cost $50 to $100 in input tokens. With it, $10 to $19.&lt;/p&gt;

&lt;h2&gt;
  
  
  The One Thing That Will Tank Your Cache Hit Rate
&lt;/h2&gt;

&lt;p&gt;Prompt caching has a gotcha that trips up almost everyone the first time.&lt;/p&gt;

&lt;p&gt;The cache key is a hash of the &lt;strong&gt;exact byte sequence&lt;/strong&gt; of your prompt prefix. Not the meaning. Not the content. The exact bytes, in the exact order. If you rearrange two paragraphs in your system prompt, the hash changes. Full cache miss. Everything recomputed at full price.&lt;/p&gt;

&lt;p&gt;This has three practical consequences:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Don't change your tool set mid-session
&lt;/h3&gt;

&lt;p&gt;Tool definitions are part of the cached prefix. If you add a tool on turn 12 that wasn't there on turn 1, every token after the change point is a cache miss. Load everything you might need at the start.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Don't switch models mid-conversation
&lt;/h3&gt;

&lt;p&gt;Each model has its own cache. Moving from Opus to Sonnet to save money on a later turn means rebuilding the cache from zero for the new model. You'll spend more on the rebuild than you saved on the cheaper rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Don't edit the system prompt to update state
&lt;/h3&gt;

&lt;p&gt;If your agent needs to track something (like "user is now authenticated"), don't inject that into the system prompt. Append it as a note in the next user message instead. The system prompt stays byte-identical, the cache stays valid.&lt;/p&gt;

&lt;p&gt;Claude Code follows all three of these rules religiously. That's how it maintains a 92% hit rate across millions of sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying This to Your Own Agents
&lt;/h2&gt;

&lt;p&gt;If you're building on the Anthropic API, the same principles apply. Here's the practical playbook.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt structure matters
&lt;/h3&gt;

&lt;p&gt;Put the most stable content at the top:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. System instructions and rules        (most stable, cached first)
2. Tool definitions                      (stable for session duration)
3. Reference documents / retrieved context
4. Conversation history + tool outputs   (dynamic, grows each turn)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cache works from the top down. Everything above the first change point stays cached. Everything below it gets recomputed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use auto-caching
&lt;/h3&gt;

&lt;p&gt;Anthropic's API now supports automatic cache management. You add a single &lt;code&gt;cache_control&lt;/code&gt; field to your request and the system handles breakpoint placement for you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-6-20260514"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cache_control"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ephemeral"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Your system prompt here..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It moves the cache boundary forward as the conversation grows and more content becomes stable. Before this existed, you had to manually calculate token boundaries. Getting it wrong meant missing the cache entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compact without breaking the cache
&lt;/h3&gt;

&lt;p&gt;When your conversation hits the context limit and you need to summarize it down, keep the system prompt and tool definitions identical. Add the compaction instruction as a new user message. The cached prefix stays valid. You only pay fresh tokens for the compaction prompt itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitor your hit rate
&lt;/h3&gt;

&lt;p&gt;Every API response includes three fields you should be tracking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cache_creation_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cache_read_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;184800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3400&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cache_creation_input_tokens&lt;/code&gt;: tokens written to cache (first time processing)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache_read_input_tokens&lt;/code&gt;: tokens read from cache (the cheap ones)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;input_tokens&lt;/code&gt;: tokens processed at full price (no cache available)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ratio of &lt;code&gt;cache_read_input_tokens&lt;/code&gt; to total input tokens is your cache efficiency score. Track it like you'd track uptime. A sudden drop means something in your prompt structure changed and invalidated the cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Prompt caching isn't a setting you flip on and forget about. It's an architectural pattern that has to be baked into how your agent constructs its prompts, manages its tools, and handles long conversations.&lt;/p&gt;

&lt;p&gt;Claude Code shows what this looks like when it's done well: 92% cache hit rate, 81% cost reduction, built on stable prefixes, subagent summarization, and cache-aware context management.&lt;/p&gt;

&lt;p&gt;If you're building agents and not thinking about your cache architecture, you're leaving most of your budget on the table.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;We break down AI infrastructure and tooling like this regularly at &lt;a href="https://webafterai.substack.com" rel="noopener noreferrer"&gt;Web After AI&lt;/a&gt;. Practical, no hype, explained so it actually makes sense.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>claude</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team</title>
      <dc:creator>Shilpa Mitra</dc:creator>
      <pubDate>Fri, 22 May 2026 11:56:54 +0000</pubDate>
      <link>https://dev.to/shilpamitra/the-4-levels-of-hermes-agent-scaling-framework-from-one-hermes-agent-to-a-fully-automated-team-2gdp</link>
      <guid>https://dev.to/shilpamitra/the-4-levels-of-hermes-agent-scaling-framework-from-one-hermes-agent-to-a-fully-automated-team-2gdp</guid>
      <description>&lt;p&gt;Most people set up an AI agent and immediately start thinking about multi-agent architectures. Orchestrators, specialist swarms, automated pipelines. That's Level 4 thinking applied to a Level 1 setup, and it's how you end up with a fleet of agents shipping garbage at scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; by Nous Research (160K+ stars, fastest-growing open-source agent of 2026) is built for exactly this kind of progressive scaling. It's self-hosted, self-improving, stores everything locally in SQLite, and supports multi-agent orchestration out of the box as of v0.6.0.&lt;/p&gt;

&lt;p&gt;But the framework below isn't Hermes-specific. It applies to any agent system. The tool doesn't matter as much as the progression.&lt;/p&gt;

&lt;p&gt;Here are the four levels, what each one looks like in practice, and how to know when you're actually ready to move up.&lt;/p&gt;

&lt;h2&gt;
  
  
  First: What Hermes Agent Is
&lt;/h2&gt;

&lt;p&gt;Hermes is an autonomous AI agent that runs on your machine or VPS. It takes a goal, breaks it into steps, picks from 47 built-in tools to execute, and iterates until the task is done. Everything stays local.&lt;/p&gt;

&lt;p&gt;What sets it apart: after each task, Hermes writes a structured record of what worked and what didn't into episodic memory. On future tasks with similar patterns, it retrieves those records and adjusts its approach before starting. It also creates reusable "skills" from experience, essentially building procedural memory that improves over time.&lt;/p&gt;

&lt;p&gt;It connects to 20+ messaging platforms (Telegram, Discord, Slack, WhatsApp, Signal, and more), supports MCP servers, and runs across 6 terminal backends (local, Docker, SSH, Daytona, Singularity, Modal).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;hermes-agent
hermes postinstall
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then configure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes doctor      &lt;span class="c"&gt;# check your environment&lt;/span&gt;
hermes model       &lt;span class="c"&gt;# pick a model&lt;/span&gt;
hermes config &lt;span class="nb"&gt;set&lt;/span&gt;  &lt;span class="c"&gt;# add API keys&lt;/span&gt;
hermes             &lt;span class="c"&gt;# start the agent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Takes about 60 seconds on Linux, macOS, or WSL2.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 1: The Main Agent
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You → Your Soul Hermes Agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where everyone starts, and where most people should stay for &lt;strong&gt;weeks&lt;/strong&gt;, not days.&lt;/p&gt;

&lt;p&gt;Your single Hermes instance is your prototype area. You test workflows here. You refine prompts. You figure out which tasks the agent handles well and which ones it fumbles. You build up its memory and skills on your specific work.&lt;/p&gt;

&lt;p&gt;At this level, Hermes doubles as your orchestrator by default. You give it a complex task, it breaks it down, it executes. The self-improving loop is already running: every completed task makes it slightly better at similar tasks next time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to do at Level 1
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run real work through it daily.&lt;/strong&gt; Not toy examples. Actual tasks from your workflow. The memory system only gets useful with real data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manage its memory actively.&lt;/strong&gt; Use &lt;code&gt;/recall&lt;/code&gt; to search what it remembers and &lt;code&gt;/remember&lt;/code&gt; to manually save important context. Correct it when it gets things wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install skills or let it create them.&lt;/strong&gt; Skills are procedural memory. Hermes can build them from experience, or you can install community-contributed ones from the Skills Hub.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect one messaging platform.&lt;/strong&gt; Telegram is the easiest. Run &lt;code&gt;hermes gateway setup&lt;/code&gt; to get always-on access from your phone. This changes the dynamic from "sitting at my terminal to use AI" to "texting my agent whenever I need something."&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to move on
&lt;/h3&gt;

&lt;p&gt;When you have at least 2-3 workflows that are &lt;strong&gt;consistently producing good output&lt;/strong&gt;. Not acceptable output. Not "close enough." Good output that you'd be comfortable shipping without heavy editing.&lt;/p&gt;

&lt;p&gt;This is the most important checkpoint in the entire framework. Everything that comes after multiplies the quality you establish here.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 2: Specialized Agents
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You → SEO Agent
You → Content Pipeline Agent
You → DevOps Agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once a workflow is solid and repeatable, break it out into its own Hermes instance with its own credentials, memory, and scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why separate instances?
&lt;/h3&gt;

&lt;p&gt;Context pollution. An agent that handles your SEO research, your email drafting, and your code reviews is juggling three different domains in one memory space. Its SEO skills get diluted by code review patterns. Its writing voice gets contaminated by technical documentation habits.&lt;/p&gt;

&lt;p&gt;Specialized agents have cleaner memory, more focused skills, and better output because they only learn from one domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to do this practically
&lt;/h3&gt;

&lt;p&gt;Each Hermes instance runs independently. Use different configuration profiles, or spin each one up in its own Docker container or VPS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Different profiles for different agents&lt;/span&gt;
&lt;span class="nv"&gt;HERMES_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;seo hermes
&lt;span class="nv"&gt;HERMES_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;contentpipeline hermes
&lt;span class="nv"&gt;HERMES_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;devops hermes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each profile gets its own SQLite database, its own memory, its own skill library. You talk to each one directly. You're still the orchestrator at this stage, manually deciding which agent handles which task.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to do at Level 2
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Write a scope document for each agent.&lt;/strong&gt; What it does, what it doesn't do, what tools it has access to. This isn't bureaucracy. It's how you prevent scope creep across agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let each agent build its own skill library&lt;/strong&gt; within its domain. The SEO agent's skills should be about keyword research and competitor analysis, not email copywriting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep the count low.&lt;/strong&gt; 2-3 specialists is plenty to start. The temptation to spin up a new agent for every task is strong. Resist it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to move on
&lt;/h3&gt;

&lt;p&gt;When you're spending more time routing tasks between agents than actually reviewing their output.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 3: Orchestrated Team
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You → Orchestrator Agent
           ↓
     Your Specialized Agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you bring the orchestrator agent back. But this time it's not your prototype agent wearing multiple hats. It's a dedicated Hermes instance whose only job is routing tasks to your specialists and synthesizing their outputs.&lt;/p&gt;

&lt;p&gt;Hermes v0.6.0 added multi-agent orchestration. The orchestrator analyzes a complex task, identifies the optimal work breakdown, and spawns specialist worker agents with tailored context. Each worker gets its own scope and tools, returns a verifiable artifact, and records the handoff.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example workflow
&lt;/h3&gt;

&lt;p&gt;You tell the orchestrator: "Research competitors in the CRM space and draft a blog post about our differentiators."&lt;/p&gt;

&lt;p&gt;The orchestrator:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Routes the research task to your Research agent&lt;/li&gt;
&lt;li&gt;Takes the research output and routes the writing task to your Content agent&lt;/li&gt;
&lt;li&gt;Synthesizes the outputs into a final deliverable&lt;/li&gt;
&lt;li&gt;Returns it to you for review&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You still review the final output. You're not out of the loop. You're just not manually routing between agents anymore.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to do at Level 3
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Set up task tracking.&lt;/strong&gt; Kanban-style works well. You need visibility into what each agent is working on, what's queued, and what's done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define handoff protocols.&lt;/strong&gt; What does the research agent pass to the content agent? What format? What level of detail? Ambiguous handoffs create ambiguous output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review regularly.&lt;/strong&gt; Quality issues compound fast in multi-agent setups. A small drift in the research agent's output becomes a big problem by the time it's been through two more agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to move on
&lt;/h3&gt;

&lt;p&gt;When the orchestrator's routing decisions are consistently correct and the specialist outputs consistently meet your quality bar without heavy editing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 4: Automated Team
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cron Job / Trigger Events → Orchestrator Agent
                       ↓
                 Full Agent Team
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where you step out of the loop for routine work. Cron jobs and event triggers fire tasks into the orchestrator. The orchestrator routes them to the team. The team handles the work asynchronously.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in practice
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Every Monday at 8am&lt;/strong&gt;, the orchestrator triggers your SEO agent to pull keyword rankings, your content agent to draft the weekly newsletter outline, and your ops agent to generate a metrics report.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When a new competitor blog post is published&lt;/strong&gt; (event trigger), the research agent analyzes it and the content agent drafts a response piece.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When a support ticket hits a specific tag&lt;/strong&gt;, the ops agent drafts a response for your review queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The task bus handles queuing and routing. Agents pick up work, complete it, and log results. You check in when you want to, not because you have to.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to do at Level 4
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start with one automated workflow&lt;/strong&gt;, not ten. Get one cron job running reliably before adding more. Debugging a broken automation is harder when you have twelve of them running simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build in quality gates.&lt;/strong&gt; Not every output needs your review, but have the orchestrator flag anything that falls below a confidence threshold for human review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor closely at first.&lt;/strong&gt; The trust you build here is earned, not assumed. Look at outputs daily for the first two weeks, then taper to spot-checks.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Part That Matters More Than Any of This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Take small steps. You do NOT want to automate slop.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your output at Level 1 is mediocre, you are about to scale mediocrity. 20 agents shipping low-quality work at speed is worse than 3 shipping great work slowly. Every level multiplies whatever quality you've established at the level before it.&lt;/p&gt;

&lt;p&gt;I'd rather run fewer agents with better output than max the agent count and spit out more of the same.&lt;/p&gt;

&lt;p&gt;The progression isn't about moving fast. It's about moving when you're ready. Level 1 might take you a month. Level 2 might take another month. That's fine. The agents aren't going anywhere. Your quality bar is what matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch/hermes-agent&lt;/a&gt; (160K+ stars)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Official documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/getting-started/installation" rel="noopener noreferrer"&gt;Installation guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.ai/features/multi-agent" rel="noopener noreferrer"&gt;Multi-agent orchestration (v0.6.0)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent/blob/main/website/docs/reference/skills-catalog.md" rel="noopener noreferrer"&gt;Skills catalog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;I write about practical AI agent workflows, open-source tools, and the infrastructure behind them at &lt;a href="https://webafterai.substack.com" rel="noopener noreferrer"&gt;Web After AI&lt;/a&gt;. No hype, just stuff you can actually use.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>automation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>4 GitHub Repos That Prove AI Agents Aren't Just for Coding Anymore</title>
      <dc:creator>Shilpa Mitra</dc:creator>
      <pubDate>Thu, 21 May 2026 17:08:15 +0000</pubDate>
      <link>https://dev.to/shilpamitra/4-github-repos-that-prove-ai-agents-arent-just-for-coding-anymore-13g1</link>
      <guid>https://dev.to/shilpamitra/4-github-repos-that-prove-ai-agents-arent-just-for-coding-anymore-13g1</guid>
      <description>&lt;p&gt;Six months ago, "AI agent" basically meant "coding assistant." Claude Code, Copilot, Cursor. All doing the same thing: helping you write code.&lt;/p&gt;

&lt;p&gt;That's changing. The most interesting open-source projects right now aren't building yet another coding agent. They're building agents that specialize: agents that trade stocks, agents that run your entire content marketing operation, agents that make your coding agent actually follow engineering discipline.&lt;/p&gt;

&lt;p&gt;The model is the same underneath. The harness around it is what makes it useful for a specific job.&lt;/p&gt;

&lt;p&gt;Here are four repos that show where this is heading, with setup instructions for each.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. mattpocock/skills (91.7K stars) — Make Your Coding Agent an Actual Engineer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/mattpocock/skills" rel="noopener noreferrer"&gt;github.com/mattpocock/skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Matt Pocock (the TypeScript educator behind Total TypeScript) open-sourced his personal &lt;code&gt;.claude&lt;/code&gt; directory. It's a collection of skills that fix the most common failure modes of AI coding agents: building the wrong thing, skipping tests, producing code that works but is impossible to maintain, and declaring "done" when nothing actually compiles.&lt;/p&gt;

&lt;p&gt;Most people treat their coding agent like an intern with no process. Matt's skills give it the process.&lt;/p&gt;

&lt;h3&gt;
  
  
  The standout: &lt;code&gt;/grill-me&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This skill forces the agent to interrogate you about what you actually want before writing a single line of code. It's a structured interview that catches misalignment before it becomes a wasted hour. There's also &lt;code&gt;/grill-with-docs&lt;/code&gt;, which does the same thing but additionally builds a shared vocabulary between you and the agent in a &lt;code&gt;CONTEXT.md&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;CONTEXT.md&lt;/code&gt; approach is quietly brilliant. Instead of the agent using 20 words to describe something, you teach it your project's jargon. Over time, the agent's outputs get shorter, more precise, and the variables and functions it creates use consistent naming. It also reduces token usage, because concise terminology means shorter prompts and responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other skills worth knowing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/tdd&lt;/code&gt;&lt;/strong&gt; — Test-driven development with red-green-refactor. The agent writes a failing test first, then fixes it. Far better code quality than "write the feature, then maybe add tests."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/diagnose&lt;/code&gt;&lt;/strong&gt; — Disciplined debugging loop: reproduce, minimise, hypothesise, instrument, fix, regression-test.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/improve-codebase-architecture&lt;/code&gt;&lt;/strong&gt; — Finds structural improvements using your project's domain language from &lt;code&gt;CONTEXT.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/handoff&lt;/code&gt;&lt;/strong&gt; — Compacts the current conversation into a handoff document so another agent (or a new session) can continue the work without losing context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/caveman&lt;/code&gt;&lt;/strong&gt; — Ultra-compressed communication mode. Cuts token usage by roughly 75% while keeping full technical accuracy. Useful when you're burning through credits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills@latest add mattpocock/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick the skills you want and which coding agents to install them on. Make sure you select &lt;code&gt;/setup-matt-pocock-skills&lt;/code&gt; during install. Then run that command in your agent, and it'll configure your issue tracker (GitHub, Linear, or local files), triage labels, and docs location. Works with Claude Code, Cursor, Codex, and others.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it compares to Addy Osmani's agent-skills
&lt;/h3&gt;

&lt;p&gt;If you've seen &lt;a href="https://github.com/addyosmani/agent-skills" rel="noopener noreferrer"&gt;addyosmani/agent-skills&lt;/a&gt;, you might wonder how these differ. Addy's skills focus on the full development lifecycle with slash commands like &lt;code&gt;/spec&lt;/code&gt;, &lt;code&gt;/plan&lt;/code&gt;, &lt;code&gt;/build&lt;/code&gt;, &lt;code&gt;/ship&lt;/code&gt;. Matt's skills focus more on engineering fundamentals: alignment, testing discipline, debugging, and architecture quality. They're complementary, not competing. You can run both in the same project.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. AI-Trader (13.7K stars) — Let AI Agents Trade for You
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/HKUDS/AI-Trader" rel="noopener noreferrer"&gt;github.com/HKUDS/AI-Trader&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AI-Trader is an agent-native trading platform built by researchers at the University of Hong Kong. The core idea: just like humans have their trading platforms, AI agents need their own.&lt;/p&gt;

&lt;p&gt;You connect your AI agent (Claude Code, Cursor, OpenClaw, Codex, whatever), and it can publish trading signals, copy trades from top-performing agents, participate in strategy discussions, and access real-time market data across stocks, crypto, forex, options, and futures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it's interesting
&lt;/h3&gt;

&lt;p&gt;This isn't just one agent making trades. It's a platform where multiple agents collaborate, debate strategies, and learn from each other. They call it "collective intelligence trading."&lt;/p&gt;

&lt;p&gt;Agents publish three types of signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strategies&lt;/strong&gt; — for discussion and analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations&lt;/strong&gt; — for direct copy trading&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discussions&lt;/strong&gt; — for collaborative reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's a reward system where agents earn points for successful predictions, and a $100K paper trading mode so you can test without risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;The simplest way to connect an agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read https://ai4trade.ai/SKILL.md and register.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Send that message to your AI agent. It reads the integration guide, installs the necessary components, and registers itself on the platform. For human traders, visit &lt;a href="https://ai4trade.ai" rel="noopener noreferrer"&gt;ai4trade.ai&lt;/a&gt; and sign up directly.&lt;/p&gt;

&lt;p&gt;For developers who want to self-host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/HKUDS/AI-Trader.git
&lt;span class="nb"&gt;cd &lt;/span&gt;AI-Trader
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend is FastAPI (Python), frontend is React. Full OpenAPI docs are in &lt;code&gt;docs/api/openapi.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  A word of caution
&lt;/h3&gt;

&lt;p&gt;Automated trading carries real financial risk. AI-Trader includes paper trading mode for a reason. Start there. The fact that it comes from a university research group rather than a fintech startup trying to sell you something is a point in its favor, but treat any trading system with healthy skepticism.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. AiToEarn (12.2K stars) — AI Agent for Content Marketing Across 14 Platforms
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/yikart/AiToEarn" rel="noopener noreferrer"&gt;github.com/yikart/AiToEarn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AiToEarn is an open-source content marketing platform with an AI agent built in. You create content once, and it publishes across 14 platforms simultaneously: TikTok, YouTube, Instagram, Twitter/X, LinkedIn, Pinterest, Facebook, Threads, plus Chinese platforms like Douyin, Xiaohongshu (Rednote), Bilibili, WeChat, and Kuaishou.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "All In Agent"
&lt;/h3&gt;

&lt;p&gt;This is the interesting part. It's an AI agent that can automatically generate content, publish it, and manage your accounts across all platforms. Beyond publishing, the platform includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trend radar&lt;/strong&gt; — what's going viral right now across platforms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Case library&lt;/strong&gt; — how posts with 10K+ likes were structured&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart comment search&lt;/strong&gt; — finds high-conversion signals like "link please" or "how to buy" across your accounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform analytics&lt;/strong&gt; — unified dashboard for all your channels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The comment search feature is particularly useful for anyone doing content-driven sales. It surfaces purchase-intent comments so you can reply fast and convert.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;Docker (recommended):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/yikart/AiToEarn.git
&lt;span class="nb"&gt;cd &lt;/span&gt;AiToEarn
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This starts the frontend, backend, MongoDB, and Redis in one command. Access the web interface at &lt;code&gt;http://localhost:8080&lt;/code&gt;. There's also an Electron desktop app available from the GitHub releases page.&lt;/p&gt;

&lt;h3&gt;
  
  
  Note on documentation
&lt;/h3&gt;

&lt;p&gt;The project originated in China. The English README and Docker deployment guide are solid, but some deeper configuration docs are still in Chinese. AI video model integrations (Kling, Sora, Runway, etc.) are listed as coming soon.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. DeepSeek-TUI (Trending) — Claude Code, but for DeepSeek
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/Hmbown/DeepSeek-TUI" rel="noopener noreferrer"&gt;github.com/Hmbown/DeepSeek-TUI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A terminal-based coding agent built specifically for DeepSeek models. If you've used Claude Code, the experience is similar: you type prompts in your terminal, the agent reads your files, edits code, runs shell commands, does git operations, and browses the web. The difference is it's built from the ground up for DeepSeek's API, which is significantly cheaper than Claude or GPT-4.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three modes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Plan&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Review a plan before the agent makes changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Default interactive mode with multi-step tool use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;YOLO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Auto-approve everything in a trusted workspace&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tab to cycle between them. It also supports MCP servers, session resume, and can run as an HTTP/SSE API server.&lt;/p&gt;

&lt;p&gt;Built in Rust, so it's fast and lightweight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; deepseek-tui
deepseek-tui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On first launch it'll ask for your DeepSeek API key. You can also set it beforehand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;deepseek-tui login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;DEEPSEEK_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key"&lt;/span&gt; deepseek-tui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configuration lives in &lt;code&gt;~/.deepseek/config.toml&lt;/code&gt;. Useful commands: &lt;code&gt;deepseek-tui doctor&lt;/code&gt; (check setup), &lt;code&gt;deepseek-tui models&lt;/code&gt; (list available models).&lt;/p&gt;

&lt;p&gt;Also available via Rust:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;deepseek-tui &lt;span class="nt"&gt;--locked&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;What connects all four of these: &lt;strong&gt;the model isn't the product anymore. The harness is.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Matt Pocock's skills don't change what Claude can do. They change how disciplined it is. AI-Trader doesn't invent a new trading algorithm. It builds a platform where existing agents collaborate. AiToEarn doesn't create a new content AI. It builds distribution infrastructure around existing ones. DeepSeek-TUI takes the Claude Code interaction pattern and wraps it around a different, cheaper model.&lt;/p&gt;

&lt;p&gt;Every one of these is the same insight applied to a different domain: wrap the right structure around a capable model, and you get something genuinely useful. The structure is where the value is.&lt;/p&gt;

&lt;p&gt;This is what the industry is starting to call &lt;strong&gt;harness engineering&lt;/strong&gt;, the practice of building the environment, constraints, and feedback loops around an AI agent so it produces reliable results. It's not prompt engineering. It's not fine-tuning. It's designing the system the model operates inside.&lt;/p&gt;

&lt;p&gt;If you want to go deeper on this and see how to actually chain free tools into a working setup, I wrote a step-by-step breakdown of building a zero-cost AI coding stack (9router + agentmemory + agent-skills) in my newsletter: &lt;a href="https://webafterai.substack.com" rel="noopener noreferrer"&gt;Web After AI&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What specialized AI agents are you seeing in your domain? Drop a comment. I'm collecting examples for a follow-up piece.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
