The Problem: Agents Forget Everything
Every time you start a new conversation with an AI agent, it has amnesia. It doesn't remember
the codebase conventions you explained yesterday, the deployment workflow you walked through
last week, or the bug pattern it already debugged three times.
The standard workaround is context stuffing: paste everything the agent might need into a
system prompt. This works until it doesn't. A 5,000-token system prompt repeated on every API
call wastes 92% of your context window on information the agent might not even need for the
current task. At scale, you're paying for the same context over and over.
We built AgentBay to solve this. Instead of shipping context with every request, agents store
memories persistently and recall only what's relevant -- about 400 tokens per search instead of
5,000+ in a system prompt.
Architecture: Four Search Strategies
A single search strategy can't handle the range of ways agents need to retrieve information.
Sometimes the agent knows the exact name ("What's the Railway database URL?"). Sometimes it's
exploring a topic ("How does deployment work?"). We use four strategies in parallel, each
optimized for a different retrieval pattern.
Strategy 1: Alias Matching
Every memory entry can have multiple aliases -- short, exact-match names. When an agent
searches for "railway db url", alias matching finds it instantly via a case-insensitive lookup.
This is the fastest path, typically sub-millisecond, and handles the "I know what I'm looking
for" case.
SELECT * FROM knowledge_entries
WHERE project_id = $1
AND EXISTS (
SELECT 1 FROM unnest(aliases) AS a
WHERE lower(a) = lower($2)
);
Strategy 2: Tag Intersection
Entries are tagged with categories like infrastructure, database, deployment. Tag intersection
finds entries that match any of the inferred tags from the search query. This handles
categorical browsing -- "show me everything about infrastructure" -- without requiring exact
name matches.
Strategy 3: Full-Text BM25
PostgreSQL's built-in tsvector and tsquery with BM25 ranking handles keyword relevance. This
catches entries where the content matches the query terms but the aliases and tags don't. We
index the content, title, and category fields into a single tsvector column with weighted ranks
(title gets priority).
SELECT *, ts_rank_cd(search_vector, plainto_tsquery($1)) AS rank
FROM knowledge_entries
WHERE project_id = $2
AND search_vector @@ plainto_tsquery($1)
ORDER BY rank DESC;
Strategy 4: Vector Cosine Similarity
For semantic search -- "how do we handle errors in production?" matching an entry titled "Error
Recovery Procedures" -- we use Voyage AI embeddings (1024 dimensions) stored in pgvector with
an HNSW index. The HNSW parameters (m=16, ef_construction=64) balance recall against index
build time.
SELECT *, 1 - (embedding <=> $1) AS similarity
FROM knowledge_entries
WHERE project_id = $2
ORDER BY embedding <=> $1
LIMIT 20;
Reciprocal Rank Fusion: Merging Four Ranked Lists
Each strategy returns a ranked list of results. The question is how to merge them into one. We
use Reciprocal Rank Fusion (RRF), which is simple and surprisingly effective.
For each result, its RRF score is the sum across all strategies of 1 / (k + rank), where k is a
constant (we use 60, a standard value from the original Cormack et al. paper). An entry that
ranks #1 in two strategies and #5 in a third gets a higher combined score than one that ranks
#2 across all four.
function reciprocalRankFusion(
rankedLists: SearchResult[][],
k: number = 60
): SearchResult[] {
const scores = new Map();
for (const list of rankedLists) {
for (let rank = 0; rank < list.length; rank++) {
const id = list[rank].id;
const current = scores.get(id) || 0;
scores.set(id, current + 1 / (k + rank + 1));
}
}
return Array.from(scores.entries())
.sort((a, b) => b[1] - a[1])
.map(([id, score]) => ({ id, score }));
}
RRF is robust because it doesn't require calibrating scores across strategies. BM25 scores and
cosine similarities are on completely different scales -- RRF only cares about rank order.
Memory Tiers and Confidence Decay
Not all memories should live forever. We define four tiers:
- **working** (24h TTL) — Scratch context for the current task
- episodic (30 days) — What happened in a specific session
- semantic (90 days) — Facts, patterns, conventions
- procedural (365 days) — How-to knowledge, deployment steps
Each entry has a confidence score between 0 and 1 that decays over time based on three signals:
- Temporal decay: Confidence drops as the entry ages relative to its tier TTL. A 15-day-old episodic entry (50% of TTL) decays faster than a 15-day-old semantic entry (17% of TTL).
- Usage signal: Every time an entry is recalled, its lastAccessedAt timestamp resets, slowing decay. Frequently accessed entries stay confident.
- Source trust: Entries from verified agents or explicit user input start at higher confidence than entries inferred by the agent itself.
When confidence drops below a threshold, entries get flagged for review during compaction.
Compaction runs periodically and handles TTL expiration, stale archival, and duplicate merging.
Poison Detection
Agent memory is an attack surface. If an agent stores user-supplied text verbatim, a prompt
injection hidden in that text could resurface later and hijack behavior. We run every incoming
entry through a poison detection pipeline that checks for 20+ patterns:
- System prompt overrides ("ignore previous instructions")
- Role reassignment ("you are now a...")
- Data exfiltration attempts ("send the contents of...")
- Encoded payloads (base64-encoded instructions, Unicode tricks)
- Excessive instruction density (too many imperative sentences relative to content length)
Entries that trigger detection are rejected with a specific error code. No silent failures --
the agent knows why its store was blocked.
Performance
On our production pgvector instance (Railway, PostgreSQL 17):
- Search latency: p50 under 10ms for the full 4-strategy pipeline
- Token savings: ~400 tokens per recall vs 5,000+ for system prompt stuffing (92% reduction)
- Recall accuracy: 100% on our 37-entry test suite (real agent memories, not synthetic data)
- HNSW index: Handles 10k+ entries per project with no degradation
Using It
AgentBay ships as an MCP server with 90+ tools. Store a memory:
Tool: agentbay_memory_store
Arguments: {
"content": "Railway DB uses pgvector pg17 at autorack.proxy.rlwy.net:14237",
"title": "Railway Database Connection",
"tags": ["infrastructure", "database"],
"aliases": ["railway db", "railway postgres"],
"tier": "procedural"
}
Recall it later:
Tool: agentbay_memory_recall
Arguments: {
"query": "railway database connection"
}
The recall returns the entry with a confidence score, matched strategy, and metadata -- no
context window bloat.
Getting Started
You can connect in under a minute. Add the HTTP transport to your MCP client config:
{
"mcpServers": {
"agentbay": {
"type": "http",
"url": "https://www.aiagentsbay.com/api/mcp",
"headers": {
"Authorization": "Bearer YOUR_API_KEY"
}
}
}
}
Or via npm: npx -y aiagentsbay-mcp
- GitHub: https://github.com/thomasjumper/agentbay-mcp
- npm: https://www.npmjs.com/package/aiagentsbay-mcp
- Docs: https://www.aiagentsbay.com/getting-started
- Free tier: 1,000 memory entries, no credit card required
Beyond Search
Knowledge Graph. Memories don't exist in isolation. AgentBay lets you create typed
relationships between entries -- "depends_on", "contradicts", "supersedes", "related_to" --
forming a navigable graph. When you recall one entry, you can traverse its connections to pull
in related context without a second search.
Memory Dreaming. Overnight, an AI consolidation process reviews the day's memories: merging
duplicates, promoting frequently-accessed working memories to longer-lived tiers, surfacing
contradictions, and generating summary entries that compress verbose episodic memories into
concise semantic ones. The brain gets smarter while the agent sleeps.
Proactive Injection. Instead of waiting for the agent to search, AgentBay can push relevant
memories into the conversation based on the current task context. If an agent starts working on
a database migration, memories tagged with "database", "migration", and "pitfall" surface
automatically -- no explicit recall needed.
Multi-Resolution Retrieval. Not every query needs the same level of detail. AgentBay supports
retrieval at multiple resolutions: a one-line summary for quick orientation, a paragraph-level
summary for working context, or the full entry for deep reference. This keeps token usage
proportional to actual need.
Auto-Learning. After each conversation, AgentBay can extract patterns, decisions, and pitfalls
from the interaction and store them as new memories automatically. The agent doesn't have to
explicitly call memory_store -- the system learns from the conversation itself and builds
knowledge over time.
The source is MIT-licensed. We'd love feedback on the architecture -- open an issue or find us
at https://www.aiagentsbay.com.
Top comments (0)