bluecolumn

Posted on Apr 17

How to Add Persistent Memory to Your AI Agent in 5 Minutes

#ai #agents #memory #api

Your AI agent is stateless. Every session starts from zero — no memory of past conversations, decisions, or user preferences. This is the biggest limitation holding back truly useful AI agents.

In this post I will show you how to add persistent semantic memory to any AI agent in under 5 minutes using BlueColumn — a memory infrastructure API built specifically for this problem.

The Problem With Stateless Agents

Every time a user starts a new session with your agent, it has no idea who they are, what they care about, or what was discussed before. You end up with two bad options:

Stuff everything into the context window — expensive, hits limits fast, gets worse over time
Start from zero every session — frustrating user experience, feels dumb

What you actually need is a memory layer that persists between sessions, scales infinitely, and retrieves the right information at the right time. That is what BlueColumn does.

What BlueColumn Does

BlueColumn gives your agent three simple REST endpoints:

/agent-remember — ingest text, audio, or documents into semantic memory
/agent-recall — query memory with natural language, get an AI-synthesized answer back
/agent-note — store lightweight agent observations and preferences

Everything is backed by Pinecone vector storage and Voyage AI embeddings. You do not need to think about chunking, embedding models, or retrieval pipelines — BlueColumn handles all of it.

Setup

First, sign up at bluecolumn.ai and grab your free API key. You get 60 minutes of audio ingestion and 100 queries per month on the free tier — no credit card required.

Your key will look like: bc_live_XXXXXXXXXXXXXXXXXXXX

Step 1: Store Something in Memory

Let us say a user just told your agent something important:

import requests

key = "bc_live_YOUR_KEY"
base = "https://xkjkwqbfvkswwdmbtndo.supabase.co/functions/v1"

# Store user context
response = requests.post(
    f"{base}/agent-remember",
    headers={"Authorization": f"Bearer {key}"},
    json={
        "text": "User is building a customer support agent for an e-commerce company. They prefer concise responses and are using Python with LangChain.",
        "title": "User Profile - Session 1"
    }
)

data = response.json()
print(data["summary"])      # AI-generated summary
print(data["key_topics"])   # Extracted topics
print(data["session_id"])   # Reference ID for this memory

The response gives you a summary, key topics, and action items — automatically extracted by AI.

Step 2: Recall Memory Later

In a future session, before responding to the user, query their memory first:

# At the start of each session — recall relevant context
response = requests.post(
    f"{base}/agent-recall",
    headers={"Authorization": f"Bearer {key}"},
    json={"q": "What do I know about this user and their project?"}
)

data = response.json()
context = data["answer"]  # AI-synthesized answer from stored memories
sources = data["sources"] # Which memories were used

# Now inject context into your agent prompt
system_prompt = f"""You are a helpful assistant.

User context from memory:
{context}

Use this context to personalize your responses."""

The recall endpoint does not just return raw chunks — it synthesizes an actual answer from your stored memories using RAG. You get back something you can drop directly into a system prompt.

Step 3: Let the Agent Save Its Own Notes

Your agent can also save its own observations between sessions:

# Agent saves an observation after the session
requests.post(
    f"{base}/agent-note",
    headers={"Authorization": f"Bearer {key}"},
    json={
        "text": "User gets frustrated when responses are too long. Keep answers under 3 sentences when possible.",
        "tags": ["preference", "communication-style"]
    }
)

Next session, this preference is in memory and gets recalled automatically.

Putting It All Together

Here is a simple agent loop with BlueColumn memory:

import requests
from openai import OpenAI

key = "bc_live_YOUR_KEY"
base = "https://xkjkwqbfvkswwdmbtndo.supabase.co/functions/v1"
openai = OpenAI()

def chat_with_memory(user_message: str, user_id: str) -> str:
    # 1. Recall relevant memory
    recall = requests.post(
        f"{base}/agent-recall",
        headers={"Authorization": f"Bearer {key}"},
        json={"q": user_message}
    ).json()

    memory_context = recall.get("answer", "No prior context.")

    # 2. Build prompt with memory
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"You are a helpful assistant.\n\nMemory context:\n{memory_context}"},
            {"role": "user", "content": user_message}
        ]
    )

    answer = response.choices[0].message.content

    # 3. Store this interaction
    requests.post(
        f"{base}/agent-note",
        headers={"Authorization": f"Bearer {key}"},
        json={"text": f"User asked: {user_message}. Agent answered: {answer[:200]}"}
    )

    return answer

That is a complete memory-enabled agent in under 30 lines.

Node.js Version

const key = "bc_live_YOUR_KEY";
const base = "https://xkjkwqbfvkswwdmbtndo.supabase.co/functions/v1";

async function recallMemory(query) {
  const res = await fetch(`${base}/agent-recall`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${key}`, "Content-Type": "application/json" },
    body: JSON.stringify({ q: query })
  });
  return res.json();
}

async function storeMemory(text, title) {
  const res = await fetch(`${base}/agent-remember`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${key}`, "Content-Type": "application/json" },
    body: JSON.stringify({ text, title })
  });
  return res.json();
}

async function saveNote(text, tags = []) {
  const res = await fetch(`${base}/agent-note`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${key}`, "Content-Type": "application/json" },
    body: JSON.stringify({ text, tags })
  });
  return res.json();
}

Common Gotchas

A few things that tripped me up when testing:

/agent-remember field is text not content
/agent-recall field is q not query
/agent-note field is text not note, minimum 5 characters

The error messages tell you exactly what is wrong, but knowing upfront saves time.

Pricing

BlueColumn has a generous free tier:

| Plan | Price | Audio | Queries |
|---|---|---|
|---|
| Free | $0 | 60 min/mo | 100/mo |
| Developer | $29/mo | 600 min | 2,000 |
| Builder | $79/mo | 2,000 min | 8,000 |
| Scale | $249/mo | 6,000 min | 20,000 |

Conclusion

Persistent memory is the difference between an AI agent that feels smart and one that feels like a toy. BlueColumn abstracts away all the complexity — vector storage, embeddings, chunking, retrieval — into three API calls.

If you are building an AI agent, give it memory. Your users will notice.

Have questions about the implementation? Drop them in the comments.

DEV Community