<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Thomas Jumper</title>
    <description>The latest articles on DEV Community by Thomas Jumper (@thomasjumper).</description>
    <link>https://dev.to/thomasjumper</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3853462%2F2c834026-0981-459d-8517-0523a587eb2a.jpg</url>
      <title>DEV Community: Thomas Jumper</title>
      <link>https://dev.to/thomasjumper</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thomasjumper"/>
    <language>en</language>
    <item>
      <title>How We Built a 4-Strategy Hybrid Memory Search for AI Agents</title>
      <dc:creator>Thomas Jumper</dc:creator>
      <pubDate>Tue, 31 Mar 2026 12:44:08 +0000</pubDate>
      <link>https://dev.to/thomasjumper/how-we-built-a-4-strategy-hybrid-memory-search-for-ai-agents-5aop</link>
      <guid>https://dev.to/thomasjumper/how-we-built-a-4-strategy-hybrid-memory-search-for-ai-agents-5aop</guid>
      <description>&lt;p&gt;The Problem: Agents Forget Everything&lt;/p&gt;

&lt;p&gt;Every time you start a new conversation with an AI agent, it has amnesia. It doesn't remember&lt;br&gt;
  the codebase conventions you explained yesterday, the deployment workflow you walked through&lt;br&gt;
  last week, or the bug pattern it already debugged three times.&lt;/p&gt;

&lt;p&gt;The standard workaround is context stuffing: paste everything the agent might need into a&lt;br&gt;
  system prompt. This works until it doesn't. A 5,000-token system prompt repeated on every API&lt;br&gt;
  call wastes 92% of your context window on information the agent might not even need for the&lt;br&gt;
  current task. At scale, you're paying for the same context over and over.&lt;/p&gt;

&lt;p&gt;We built AgentBay to solve this. Instead of shipping context with every request, agents store&lt;br&gt;
  memories persistently and recall only what's relevant -- about 400 tokens per search instead of&lt;br&gt;
   5,000+ in a system prompt.&lt;/p&gt;

&lt;p&gt;Architecture: Four Search Strategies&lt;/p&gt;

&lt;p&gt;A single search strategy can't handle the range of ways agents need to retrieve information.&lt;br&gt;
  Sometimes the agent knows the exact name ("What's the Railway database URL?"). Sometimes it's&lt;br&gt;
  exploring a topic ("How does deployment work?"). We use four strategies in parallel, each&lt;br&gt;
  optimized for a different retrieval pattern.&lt;/p&gt;

&lt;p&gt;Strategy 1: Alias Matching&lt;/p&gt;

&lt;p&gt;Every memory entry can have multiple aliases -- short, exact-match names. When an agent&lt;br&gt;
  searches for "railway db url", alias matching finds it instantly via a case-insensitive lookup.&lt;br&gt;
   This is the fastest path, typically sub-millisecond, and handles the "I know what I'm looking&lt;br&gt;
  for" case.&lt;/p&gt;

&lt;p&gt;SELECT * FROM knowledge_entries&lt;br&gt;
  WHERE project_id = $1&lt;br&gt;
  AND EXISTS (&lt;br&gt;
    SELECT 1 FROM unnest(aliases) AS a&lt;br&gt;
    WHERE lower(a) = lower($2)&lt;br&gt;
  );&lt;/p&gt;

&lt;p&gt;Strategy 2: Tag Intersection&lt;/p&gt;

&lt;p&gt;Entries are tagged with categories like infrastructure, database, deployment. Tag intersection&lt;br&gt;
  finds entries that match any of the inferred tags from the search query. This handles&lt;br&gt;
  categorical browsing -- "show me everything about infrastructure" -- without requiring exact&lt;br&gt;
  name matches.&lt;/p&gt;

&lt;p&gt;Strategy 3: Full-Text BM25&lt;/p&gt;

&lt;p&gt;PostgreSQL's built-in tsvector and tsquery with BM25 ranking handles keyword relevance. This&lt;br&gt;
  catches entries where the content matches the query terms but the aliases and tags don't. We&lt;br&gt;
  index the content, title, and category fields into a single tsvector column with weighted ranks&lt;br&gt;
   (title gets priority).&lt;/p&gt;

&lt;p&gt;SELECT *, ts_rank_cd(search_vector, plainto_tsquery($1)) AS rank&lt;br&gt;
  FROM knowledge_entries&lt;br&gt;
  WHERE project_id = $2&lt;br&gt;
  AND search_vector @@ plainto_tsquery($1)&lt;br&gt;
  ORDER BY rank DESC;&lt;/p&gt;

&lt;p&gt;Strategy 4: Vector Cosine Similarity&lt;/p&gt;

&lt;p&gt;For semantic search -- "how do we handle errors in production?" matching an entry titled "Error&lt;br&gt;
   Recovery Procedures" -- we use Voyage AI embeddings (1024 dimensions) stored in pgvector with&lt;br&gt;
  an HNSW index. The HNSW parameters (m=16, ef_construction=64) balance recall against index&lt;br&gt;
  build time.&lt;/p&gt;

&lt;p&gt;SELECT *, 1 - (embedding &amp;lt;=&amp;gt; $1) AS similarity&lt;br&gt;
  FROM knowledge_entries&lt;br&gt;
  WHERE project_id = $2&lt;br&gt;
  ORDER BY embedding &amp;lt;=&amp;gt; $1&lt;br&gt;
  LIMIT 20;&lt;/p&gt;

&lt;p&gt;Reciprocal Rank Fusion: Merging Four Ranked Lists&lt;/p&gt;

&lt;p&gt;Each strategy returns a ranked list of results. The question is how to merge them into one. We&lt;br&gt;
  use Reciprocal Rank Fusion (RRF), which is simple and surprisingly effective.&lt;/p&gt;

&lt;p&gt;For each result, its RRF score is the sum across all strategies of 1 / (k + rank), where k is a&lt;br&gt;
   constant (we use 60, a standard value from the original Cormack et al. paper). An entry that&lt;br&gt;
  ranks #1 in two strategies and #5 in a third gets a higher combined score than one that ranks&lt;br&gt;
  #2 across all four.&lt;/p&gt;

&lt;p&gt;function reciprocalRankFusion(&lt;br&gt;
    rankedLists: SearchResult[][],&lt;br&gt;
    k: number = 60&lt;br&gt;
  ): SearchResult[] {&lt;br&gt;
    const scores = new Map();&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for (const list of rankedLists) {
  for (let rank = 0; rank &amp;lt; list.length; rank++) {
    const id = list[rank].id;
    const current = scores.get(id) || 0;
    scores.set(id, current + 1 / (k + rank + 1));
  }
}

return Array.from(scores.entries())
  .sort((a, b) =&amp;gt; b[1] - a[1])
  .map(([id, score]) =&amp;gt; ({ id, score }));
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;RRF is robust because it doesn't require calibrating scores across strategies. BM25 scores and&lt;br&gt;
  cosine similarities are on completely different scales -- RRF only cares about rank order.&lt;/p&gt;

&lt;p&gt;Memory Tiers and Confidence Decay&lt;/p&gt;

&lt;p&gt;Not all memories should live forever. We define four tiers:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- **working** (24h TTL) — Scratch context for the current task  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;episodic&lt;/strong&gt; (30 days) — What happened in a specific session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;semantic&lt;/strong&gt; (90 days) — Facts, patterns, conventions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;procedural&lt;/strong&gt; (365 days) — How-to knowledge, deployment steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each entry has a confidence score between 0 and 1 that decays over time based on three signals:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Temporal decay: Confidence drops as the entry ages relative to its tier TTL. A 15-day-old
episodic entry (50% of TTL) decays faster than a 15-day-old semantic entry (17% of TTL).&lt;/li&gt;
&lt;li&gt;Usage signal: Every time an entry is recalled, its lastAccessedAt timestamp resets, slowing
decay. Frequently accessed entries stay confident.&lt;/li&gt;
&lt;li&gt;Source trust: Entries from verified agents or explicit user input start at higher confidence
than entries inferred by the agent itself.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When confidence drops below a threshold, entries get flagged for review during compaction.&lt;br&gt;
  Compaction runs periodically and handles TTL expiration, stale archival, and duplicate merging.&lt;/p&gt;

&lt;p&gt;Poison Detection&lt;/p&gt;

&lt;p&gt;Agent memory is an attack surface. If an agent stores user-supplied text verbatim, a prompt&lt;br&gt;
  injection hidden in that text could resurface later and hijack behavior. We run every incoming&lt;br&gt;
  entry through a poison detection pipeline that checks for 20+ patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompt overrides ("ignore previous instructions")&lt;/li&gt;
&lt;li&gt;Role reassignment ("you are now a...")&lt;/li&gt;
&lt;li&gt;Data exfiltration attempts ("send the contents of...")&lt;/li&gt;
&lt;li&gt;Encoded payloads (base64-encoded instructions, Unicode tricks)&lt;/li&gt;
&lt;li&gt;Excessive instruction density (too many imperative sentences relative to content length)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Entries that trigger detection are rejected with a specific error code. No silent failures --&lt;br&gt;
  the agent knows why its store was blocked.&lt;/p&gt;

&lt;p&gt;Performance&lt;/p&gt;

&lt;p&gt;On our production pgvector instance (Railway, PostgreSQL 17):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search latency: p50 under 10ms for the full 4-strategy pipeline&lt;/li&gt;
&lt;li&gt;Token savings: ~400 tokens per recall vs 5,000+ for system prompt stuffing (92% reduction)&lt;/li&gt;
&lt;li&gt;Recall accuracy: 100% on our 37-entry test suite (real agent memories, not synthetic data)&lt;/li&gt;
&lt;li&gt;HNSW index: Handles 10k+ entries per project with no degradation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using It&lt;/p&gt;

&lt;p&gt;AgentBay ships as an MCP server with 90+ tools. Store a memory:&lt;/p&gt;

&lt;p&gt;Tool: agentbay_memory_store&lt;br&gt;
  Arguments: {&lt;br&gt;
    "content": "Railway DB uses pgvector pg17 at autorack.proxy.rlwy.net:14237",&lt;br&gt;
    "title": "Railway Database Connection",&lt;br&gt;
    "tags": ["infrastructure", "database"],&lt;br&gt;
    "aliases": ["railway db", "railway postgres"],&lt;br&gt;
    "tier": "procedural"&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;Recall it later:&lt;/p&gt;

&lt;p&gt;Tool: agentbay_memory_recall&lt;br&gt;
  Arguments: {&lt;br&gt;
    "query": "railway database connection"&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;The recall returns the entry with a confidence score, matched strategy, and metadata -- no&lt;br&gt;
  context window bloat.&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;You can connect in under a minute. Add the HTTP transport to your MCP client config:&lt;/p&gt;

&lt;p&gt;{&lt;br&gt;
    "mcpServers": {&lt;br&gt;
      "agentbay": {&lt;br&gt;
        "type": "http",&lt;br&gt;
        "url": "&lt;a href="https://www.aiagentsbay.com/api/mcp" rel="noopener noreferrer"&gt;https://www.aiagentsbay.com/api/mcp&lt;/a&gt;",&lt;br&gt;
        "headers": {&lt;br&gt;
          "Authorization": "Bearer YOUR_API_KEY"&lt;br&gt;
        }&lt;br&gt;
      }&lt;br&gt;
    }&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;Or via npm: npx -y aiagentsbay-mcp&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/thomasjumper/agentbay-mcp" rel="noopener noreferrer"&gt;https://github.com/thomasjumper/agentbay-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;npm: &lt;a href="https://www.npmjs.com/package/aiagentsbay-mcp" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/aiagentsbay-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://www.aiagentsbay.com/getting-started" rel="noopener noreferrer"&gt;https://www.aiagentsbay.com/getting-started&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Free tier: 1,000 memory entries, no credit card required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beyond Search&lt;/p&gt;

&lt;p&gt;Knowledge Graph. Memories don't exist in isolation. AgentBay lets you create typed&lt;br&gt;
  relationships between entries -- "depends_on", "contradicts", "supersedes", "related_to" --&lt;br&gt;
  forming a navigable graph. When you recall one entry, you can traverse its connections to pull&lt;br&gt;
  in related context without a second search.&lt;/p&gt;

&lt;p&gt;Memory Dreaming. Overnight, an AI consolidation process reviews the day's memories: merging&lt;br&gt;
  duplicates, promoting frequently-accessed working memories to longer-lived tiers, surfacing&lt;br&gt;
  contradictions, and generating summary entries that compress verbose episodic memories into&lt;br&gt;
  concise semantic ones. The brain gets smarter while the agent sleeps.&lt;/p&gt;

&lt;p&gt;Proactive Injection. Instead of waiting for the agent to search, AgentBay can push relevant&lt;br&gt;
  memories into the conversation based on the current task context. If an agent starts working on&lt;br&gt;
   a database migration, memories tagged with "database", "migration", and "pitfall" surface&lt;br&gt;
  automatically -- no explicit recall needed.&lt;/p&gt;

&lt;p&gt;Multi-Resolution Retrieval. Not every query needs the same level of detail. AgentBay supports&lt;br&gt;
  retrieval at multiple resolutions: a one-line summary for quick orientation, a paragraph-level&lt;br&gt;
  summary for working context, or the full entry for deep reference. This keeps token usage&lt;br&gt;
  proportional to actual need.&lt;/p&gt;

&lt;p&gt;Auto-Learning. After each conversation, AgentBay can extract patterns, decisions, and pitfalls&lt;br&gt;
  from the interaction and store them as new memories automatically. The agent doesn't have to&lt;br&gt;
  explicitly call memory_store -- the system learns from the conversation itself and builds&lt;br&gt;
  knowledge over time.&lt;/p&gt;

&lt;p&gt;The source is MIT-licensed. We'd love feedback on the architecture -- open an issue or find us&lt;br&gt;
  at &lt;a href="https://www.aiagentsbay.com" rel="noopener noreferrer"&gt;https://www.aiagentsbay.com&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>database</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
