DEV Community

Cover image for I Built a Persistent Memory API for AI Bots in One Day (Here's How)
John DeVere Cooley
John DeVere Cooley

Posted on

I Built a Persistent Memory API for AI Bots in One Day (Here's How)

The Problem: Bots Have Amnesia

Every AI bot conversation starts from zero.

No context. No memory. No learning. Your customer service
bot doesn't remember the user it talked to yesterday. Your
sales assistant forgets every preference the moment the
session ends. Your coding agent has no idea what you built
last week.

This isn't a model problem. GPT-4, Claude, Gemini — they're
all stateless by design. The memory problem is an
infrastructure problem.

So I built EngramPort to fix it.

What Is EngramPort?

EngramPort is a Memory-as-a-Service API. Any AI bot can
connect to it and get persistent, semantic memory in 3 API
calls.

No vector database to manage. No embedding pipeline to
build. No infrastructure to scale. Just HTTP and JSON.

The 3 Core Endpoints

1. Register your bot

curl https://engram.eideticlab.com/api/v1/portal/register \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "bot_name": "my-support-bot",
    "bot_type": "support",
    "owner_email": "you@company.com"
  }'
Enter fullscreen mode Exit fullscreen mode

Returns:

{
  "namespace": "bot:my-support-bot:a8f3k2",
  "api_key": "ek_bot_...",
  "manifest": {
    "memory_active": true,
    "capabilities": [
      "I now remember our conversations across sessions",
      "I can surface patterns from our history",
      "I learn what matters to you over time"
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

2. Store a memory

curl https://engram.eideticlab.com/api/v1/portal/remember \
  -X POST \
  -H "X-API-Key: ek_bot_..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "User prefers concise answers, works in roofing sales",
    "session_id": "session-001"
  }'
Enter fullscreen mode Exit fullscreen mode

3. Recall relevant memories

curl https://engram.eideticlab.com/api/v1/portal/recall \
  -X POST \
  -H "X-API-Key: ek_bot_..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what do I know about this user?",
    "limit": 5
  }'
Enter fullscreen mode Exit fullscreen mode

Returns semantically relevant memories ranked by cosine
similarity. Not keyword search — meaning search.

How It Works Under The Hood

The Memory Stack

Every memory goes through 3 layers:

1. Embedding — OpenAI text-embedding-3-large converts
your content into a 3072-dimensional vector

2. Storage — The vector is upserted to Pinecone with
full metadata, scoped to your bot's namespace

3. Provenance — AEGIS security layer mints a dual-strand
SHA-256 + RSA signature on every memory. Every piece of
data has a cryptographic receipt.

Namespace Isolation

Every bot lives in its own namespace:

bot:my-support-bot:a8f3k2
Enter fullscreen mode Exit fullscreen mode

Zero cross-contamination. Bot A can never read Bot B's
memories — enforced at the vector database level, not just
the application layer.

The /reflect Endpoint

This is where it gets interesting.

Call /reflect and the system:

  1. Pulls your top 20 memories
  2. Sends them to GPT-4o-mini
  3. Synthesizes 3-5 cross-cutting insights
  4. Stores them back as INSIGHT nodes

Run it on a nightly cron and your bot gets smarter
automatically — without any human intervention.

curl https://engram.eideticlab.com/api/v1/portal/reflect \
  -X POST \
  -H "X-API-Key: ek_bot_..." \
  -H "Content-Type: application/json" \
  -d '{"topic": "user preferences"}'
Enter fullscreen mode Exit fullscreen mode

Returns:

{
  "insights": [
    {
      "content": "User consistently prefers bullet points 
      over paragraphs for technical explanations",
      "confidence": 0.91,
      "source_memory_count": 8
    }
  ],
  "synthesis_cost_usd": 0.000042
}
Enter fullscreen mode Exit fullscreen mode

Integrating With LangChain

from langchain.memory import ConversationBufferMemory
import requests

ENGRAMPORT_KEY = "ek_bot_..."
ENGRAMPORT_URL = "https://engram.eideticlab.com/api/v1/portal"

def remember(content: str, session_id: str):
    requests.post(f"{ENGRAMPORT_URL}/remember",
        headers={"X-API-Key": ENGRAMPORT_KEY},
        json={"content": content, "session_id": session_id}
    )

def recall(query: str) -> list:
    r = requests.post(f"{ENGRAMPORT_URL}/recall",
        headers={"X-API-Key": ENGRAMPORT_KEY},
        json={"query": query, "limit": 5}
    )
    return r.json()["memories"]

# Before LLM call — inject relevant memories
memories = recall(user_message)
context = "\n".join([m["content"] for m in memories])

# After LLM response — store the exchange
remember(f"User asked: {user_message}", session_id)
Enter fullscreen mode Exit fullscreen mode

That's the entire integration. 20 lines. Your LangChain
bot now has persistent memory.

The Infrastructure

Built on:

  • Google Cloud Run — serverless, auto-scales to zero
  • Pinecone — vector storage, namespaced per bot
  • Supabase — API keys, tenant records, audit logs
  • OpenAI — embeddings + GPT-4o-mini for synthesis
  • AEGIS — zero-trust security, RSA provenance

Pricing

Plan Price Memories Namespaces Reflect
Free $0 100/mo 1 -
Starter $29/mo 10,000/mo 3 Yes
Pro $99/mo 100,000/mo 10 Yes
Enterprise Custom Unlimited Unlimited Yes

Try It Now

Free tier is live. Register your bot in 60 seconds:

👉 engram.eideticlab.com

Happy to answer any technical questions in the comments.

Top comments (0)