Your AI agent is stateless. Every session starts from zero — no memory of past conversations, decisions, or user preferences. This is the biggest limitation holding back truly useful AI agents.
In this post I will show you how to add persistent semantic memory to any AI agent in under 5 minutes using BlueColumn — a memory infrastructure API built specifically for this problem.
The Problem With Stateless Agents
Every time a user starts a new session with your agent, it has no idea who they are, what they care about, or what was discussed before. You end up with two bad options:
- Stuff everything into the context window — expensive, hits limits fast, gets worse over time
- Start from zero every session — frustrating user experience, feels dumb
What you actually need is a memory layer that persists between sessions, scales infinitely, and retrieves the right information at the right time. That is what BlueColumn does.
What BlueColumn Does
BlueColumn gives your agent three simple REST endpoints:
-
/agent-remember— ingest text, audio, or documents into semantic memory -
/agent-recall— query memory with natural language, get an AI-synthesized answer back -
/agent-note— store lightweight agent observations and preferences
Everything is backed by Pinecone vector storage and Voyage AI embeddings. You do not need to think about chunking, embedding models, or retrieval pipelines — BlueColumn handles all of it.
Setup
First, sign up at bluecolumn.ai and grab your free API key. You get 60 minutes of audio ingestion and 100 queries per month on the free tier — no credit card required.
Your key will look like: bc_live_XXXXXXXXXXXXXXXXXXXX
Step 1: Store Something in Memory
Let us say a user just told your agent something important:
import requests
key = "bc_live_YOUR_KEY"
base = "https://xkjkwqbfvkswwdmbtndo.supabase.co/functions/v1"
# Store user context
response = requests.post(
f"{base}/agent-remember",
headers={"Authorization": f"Bearer {key}"},
json={
"text": "User is building a customer support agent for an e-commerce company. They prefer concise responses and are using Python with LangChain.",
"title": "User Profile - Session 1"
}
)
data = response.json()
print(data["summary"]) # AI-generated summary
print(data["key_topics"]) # Extracted topics
print(data["session_id"]) # Reference ID for this memory
The response gives you a summary, key topics, and action items — automatically extracted by AI.
Step 2: Recall Memory Later
In a future session, before responding to the user, query their memory first:
# At the start of each session — recall relevant context
response = requests.post(
f"{base}/agent-recall",
headers={"Authorization": f"Bearer {key}"},
json={"q": "What do I know about this user and their project?"}
)
data = response.json()
context = data["answer"] # AI-synthesized answer from stored memories
sources = data["sources"] # Which memories were used
# Now inject context into your agent prompt
system_prompt = f"""You are a helpful assistant.
User context from memory:
{context}
Use this context to personalize your responses."""
The recall endpoint does not just return raw chunks — it synthesizes an actual answer from your stored memories using RAG. You get back something you can drop directly into a system prompt.
Step 3: Let the Agent Save Its Own Notes
Your agent can also save its own observations between sessions:
# Agent saves an observation after the session
requests.post(
f"{base}/agent-note",
headers={"Authorization": f"Bearer {key}"},
json={
"text": "User gets frustrated when responses are too long. Keep answers under 3 sentences when possible.",
"tags": ["preference", "communication-style"]
}
)
Next session, this preference is in memory and gets recalled automatically.
Putting It All Together
Here is a simple agent loop with BlueColumn memory:
import requests
from openai import OpenAI
key = "bc_live_YOUR_KEY"
base = "https://xkjkwqbfvkswwdmbtndo.supabase.co/functions/v1"
openai = OpenAI()
def chat_with_memory(user_message: str, user_id: str) -> str:
# 1. Recall relevant memory
recall = requests.post(
f"{base}/agent-recall",
headers={"Authorization": f"Bearer {key}"},
json={"q": user_message}
).json()
memory_context = recall.get("answer", "No prior context.")
# 2. Build prompt with memory
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": f"You are a helpful assistant.\n\nMemory context:\n{memory_context}"},
{"role": "user", "content": user_message}
]
)
answer = response.choices[0].message.content
# 3. Store this interaction
requests.post(
f"{base}/agent-note",
headers={"Authorization": f"Bearer {key}"},
json={"text": f"User asked: {user_message}. Agent answered: {answer[:200]}"}
)
return answer
That is a complete memory-enabled agent in under 30 lines.
Node.js Version
const key = "bc_live_YOUR_KEY";
const base = "https://xkjkwqbfvkswwdmbtndo.supabase.co/functions/v1";
async function recallMemory(query) {
const res = await fetch(`${base}/agent-recall`, {
method: "POST",
headers: { "Authorization": `Bearer ${key}`, "Content-Type": "application/json" },
body: JSON.stringify({ q: query })
});
return res.json();
}
async function storeMemory(text, title) {
const res = await fetch(`${base}/agent-remember`, {
method: "POST",
headers: { "Authorization": `Bearer ${key}`, "Content-Type": "application/json" },
body: JSON.stringify({ text, title })
});
return res.json();
}
async function saveNote(text, tags = []) {
const res = await fetch(`${base}/agent-note`, {
method: "POST",
headers: { "Authorization": `Bearer ${key}`, "Content-Type": "application/json" },
body: JSON.stringify({ text, tags })
});
return res.json();
}
Common Gotchas
A few things that tripped me up when testing:
-
/agent-rememberfield istextnotcontent -
/agent-recallfield isqnotquery -
/agent-notefield istextnotnote, minimum 5 characters
The error messages tell you exactly what is wrong, but knowing upfront saves time.
Pricing
BlueColumn has a generous free tier:
| Plan | Price | Audio | Queries |
|---|---|---|
|---|
| Free | $0 | 60 min/mo | 100/mo |
| Developer | $29/mo | 600 min | 2,000 |
| Builder | $79/mo | 2,000 min | 8,000 |
| Scale | $249/mo | 6,000 min | 20,000 |
Conclusion
Persistent memory is the difference between an AI agent that feels smart and one that feels like a toy. BlueColumn abstracts away all the complexity — vector storage, embeddings, chunking, retrieval — into three API calls.
If you are building an AI agent, give it memory. Your users will notice.
Sign up free at bluecolumn.ai — no credit card required.
Have questions about the implementation? Drop them in the comments.
Top comments (0)