韩

Posted on Apr 28

Beads: The GitHub Project That Finally Gives Your AI Coding Agent a Memory

This article is inspired by Beads, a Go-based memory system for coding agents that is quietly gaining traction among AI engineers. Special shoutout to @nicepkg and the AI coding community driving this forward.

Why Your AI Coding Agent Keeps Forgetting Everything

Here is a scenario that will feel painfully familiar: You have been pair-programming with Claude Code or OpenCode for two hours. The agent has read your codebase, understood your architecture, refactored three modules, and written a handful of tests. Then you close the session. When you open a new terminal and fire up the agent again? It is a blank slate.

No memory. No context. No institutional knowledge of your 200,000-line monorepo.

We obsess over which model to use, how many tokens per dollar we are burning, and whether to use RAG pipelines. But the single biggest friction point in AI-assisted coding is something far more mundane: context does not survive between sessions.

A new GitHub project called Beads is tackling this exact problem -- and it just landed on GitHub trending. Let us dig into what it does and the five hidden patterns it reveals about building memory for AI agents.

"The agent is not the hard part. The scaffolding around it is." -- Matt Stratton, DEV community

1. Semantic Session Snapshots -- Capture What Actually Matters

Most session history tools just dump raw conversation logs. That is useless. A 2-hour coding session with an AI agent produces 50,000 tokens of back-and-forth, most of which is failed attempts, clarifying questions, boilerplate generation, and system prompts.

The actual knowledge worth preserving is the decisions made, the files touched, and the rationale. Beads uses semantic chunking to extract meaningful beads from a session.

# Semantic snapshot extraction -- filter for high-value conversation beads
import re

def extract_session_beads(messages):
    # High-value = code decisions, architecture changes, file edits
    action_keywords = [
        "implemented", "refactored", "created",
        "changed strategy", "decided to use",
        "updated", "fixed", "migrated"
    ]
    beads = []
    for msg in messages:
        content = msg.get("content", "").lower()
        role = msg.get("role", "")
        # Only capture assistant actions, not user clarifications
        if role != "assistant":
            continue
        # Check if this message contains a meaningful action
        if any(indicator in content for indicator in action_keywords):
            beads.append({
                "type": "action_bead",
                "summary": summarize_action(msg),
                "files_touched": extract_file_paths(content),
                "tokens": len(msg.get("content", "")) // 4
            })
    return beads

def summarize_action(message):
    # Extract a one-line summary of what was done
    content = message.get("content", "")
    lines = content.split("
")
    for line in reversed(lines):
        if any(w in line.lower() for w in ["created", "updated", "wrote", "modified"]):
            return line.strip()[:120]
    return content[:120]

def extract_file_paths(content):
    # Find all file paths mentioned in the message
    return re.findall(r"[\w/\-]+\.[a-zA-Z]+", content)

Why most developers miss this: They think session memory means saving chat logs. It does not -- it means distilling decisions, not dialogue.

2. Cross-Session Context Injection -- The Agent Remembers Yesterday

The real magic of Beads is not just saving a session. It is about injecting relevant historical context into a new session without flooding the context window.

Imagine starting a new Claude Code session and the agent already knows you migrated from REST to GraphQL last week, the PaymentService module is off-limits waiting for QA, and your naming convention uses use* for React hooks and handle* for event handlers.

This is what cross-session memory looks like. The agent does not just know facts -- it knows your codebase history.

# Cross-session memory retrieval for AI coding agents
class BeadsMemory:
    def __init__(self, embedding_model="sentence-transformers/all-MiniLM-L6-v2"):
        self.beads_db = {}  # project_name -> list of Bead objects
        from sentence_transformers import SentenceTransformer
        self.encoder = SentenceTransformer(embedding_model)

    def retrieve_relevant_beads(self, project, query, top_k=5):
        # Given a new task, retrieve relevant historical beads
        beads = self.beads_db.get(project, [])
        if not beads:
            return []
        bead_texts = [b["summary"] for b in beads]
        query_vec = self.encoder.encode([query])
        bead_vecs = self.encoder.encode(bead_texts)
        # Cosine similarity
        from sklearn.metrics.pairwise import cosine_similarity
        scores = cosine_similarity(query_vec, bead_vecs)[0]
        threshold = 0.6
        results = []
        for i, score in enumerate(scores):
            if score > threshold:
                # Truncate to save tokens
                results.append(beads[i]["full_context"][:2000])
        return results[:top_k]

    def inject_context(self, project, task):
        # Build a context injection prompt for a new agent session
        beads = self.retrieve_relevant_beads(project, task, top_k=5)
        if not beads:
            return ""
        header = "## Relevant Historical Context

"
        footer = "

---
*Context provided by Beads memory system*"
        return header + "

---

".join(beads) + footer

# Usage
memory = BeadsMemory()
context = memory.inject_context(
    project="my-webapp",
    task="add user authentication with OAuth2"
)
print(context)
# --> "## Relevant Historical Context

...You migrated from REST to GraphQL last week..."

3. Context Window Awareness -- The Agent Knows When It Is Running Out

Most AI coding agents have no idea how much context they have left. They keep adding files, reading code, and generating until they suddenly hit the context limit and the whole session collapses.

Beads implements a proactive context monitoring layer.

import tiktoken  # Accurate token counting

class ContextMonitor:
    # Monitors context window usage and warns before running out of space
    def __init__(self, model="gpt-4", max_tokens=128_000):
        self.encoder = tiktoken.get_encoding("cl100k_base")
        self.max_tokens = max_tokens
        self.used_tokens = 0
        self.file_costs = {}

    def register_file(self, filepath, content):
        # Load a file into context and track its token cost
        tokens = len(self.encoder.encode(content))
        self.file_costs[filepath] = tokens
        self.used_tokens += tokens
        return tokens

    def get_utilization(self):
        # Return context utilization as a percentage
        return (self.used_tokens / self.max_tokens) * 100

    def get_warning(self):
        # Return a warning message if context is running low
        util = self.get_utilization()
        if util > 90:
            return "CRITICAL: Context at {:.1f}%. Archive session now.".format(util)
        elif util > 75:
            return "WARNING: Context at {:.1f}%. Summarize and archive.".format(util)
        elif util > 60:
            return "NOTE: Context at {:.1f}%. Offload to long-term memory.".format(util)
        return None

    def auto_archive_trigger(self):
        # Auto-trigger session save when threshold is hit
        return self.get_utilization() > 80

# Integration example
monitor = ContextMonitor(model="claude-3-5-sonnet", max_tokens=200_000)
monitor.register_file("src/main.py", open("src/main.py").read())
monitor.register_file("src/models.py", open("src/models.py").read())
print(monitor.get_warning())

Why it matters: This is a zero-cost problem solver. Every time an agent hits a context limit mid-task, you waste API calls re-explaining the codebase. Proactive monitoring eliminates that waste.

4. Tool Call Memory -- Tracking What the Agent Actually Did

Most AI coding agents execute commands but do not remember what they executed. This creates repetition and lost audit trails. Beads solves this by logging every tool call with its outcome.

from datetime import datetime
import json

class ToolCallMemory:
    # Records every tool/command execution with outcome
    def __init__(self, project):
        self.project = project
        self.log = []

    def record(self, tool_name, args, result, success):
        entry = {
            "timestamp": datetime.utcnow().isoformat(),
            "tool": tool_name,
            "args": args,
            "result_preview": result[:500] if result else "",
            "success": success,
            "tokens_used": len(result) // 4 if result else 0
        }
        self.log.append(entry)
        self._persist()

    def was_already_done(self, tool_name, args):
        # Check if this exact command was already run successfully
        for entry in self.log:
            if entry["tool"] == tool_name and entry["success"]:
                if str(entry["args"]) == str(args):
                    return True
        return False

    def suggest_next_steps(self, completed):
        # Based on what was done, suggest what to do next
        suggestions = {
            ("npm install",): ["npm run dev", "npm test"],
            ("git init",): ["git add .", "git commit -m 'initial'"],
            ("create component",): ["add tests", "add styles", "check imports"],
        }
        for done in completed:
            for key, vals in suggestions.items():
                if done in key:
                    return [v for v in vals if v not in completed]
        return []

    def _persist(self):
        import os
        path = ".beads/{}/tool_log.json".format(self.project)
        os.makedirs(os.path.dirname(path), exist_ok=True)
        with open(path, "w") as f:
            json.dump(self.log, f, indent=2)

5. Beads Architecture -- What Makes It Different

The key insight behind Beads is a tiered memory architecture:

Long-Term Memory (Beads DB): Persists across sessions. Semantic search, cross-project learning.
Session Memory (RAM): Current session context. Active files, conversation state.
Working Memory (Context): What the LLM sees right now. Truncated to fit context window.

This tiered approach means the LLM only sees what is relevant right now, session memory is summarized before archiving, and long-term memory is retrieved via semantic search, not dumped wholesale.

GitHub: nicepkg/beads -- Star it if you are building AI coding tools.

What This Means for the AI Coding Landscape

The AI coding agent space is maturing fast. We have moved from "which model is best" to "which workflow is best". And the next frontier is clearly memory and continuity.

Just like DevOps did not become a discipline until we had continuous integration, AI coding will not reach its potential until we have continuous agent memory.

The Vercel breach discussion on DEV reminds us: the scaffolding around AI is more important than the AI itself. Beads is part of that scaffolding.

What Is Your Agent Memory Strategy?

I would love to hear how you are handling context continuity in your AI coding workflows:

Do you use any memory systems today?
What is the biggest context-related pain point you have hit?
Would you trust an agent more if you knew it had perfect recall?

Drop your thoughts in the comments. And if you found this useful, share it with a fellow developer who is still fighting the blank slate problem.

DEV Community