How I Build AI Agents That Actually Remember

#ai #agents #memory #production

I watched an agent complete 12 steps of a 15-step browser workflow, then crash because it couldn't remember what it had already submitted. The form was half filled. The data was lost. The session was worthless.

That was early 2024. Since then I've built production agent systems that handle this differently. A job board platform scoring 10,000+ listings daily. An AI resume tailor generating dozens of tailored documents in parallel. Browser automation agents navigating multi-page workflows without losing state.

The difference between those early failures and what works today comes down to one thing: how you handle context persistence.

Why Most Agent Memory Architectures Fail

Three patterns kill production agents dead.

First, the stateless loop. You call an LLM, get a response, act on it, then throw away everything except the final output. The next call starts from zero. This works for single-shot tasks but fails for anything multi-step.

Second, the infinite context window trap. You dump everything into the prompt. Every prior output, every observation, the entire conversation history. This works until it doesn't. Token costs explode. The model loses focus in a sea of noise. And you hit context limits on long sessions.

Third, fragile in-memory state. You keep session data in a running object. It works fine in development. In production, a server restart, a crash, or a concurrent request wipes everything.

I've hit all three. The fix is a layered memory architecture that separates what the agent knows right now from what it should remember long term.

Pattern 1: Checkpointing with Structured Logs

The simplest pattern that makes a real difference: log every meaningful state transition to a database.

For the job board platform's LLM scoring pipeline, each scoring call against a listing produces a structured record. The agent records what listing it scored, what criteria it used, what score it assigned, and why. If the process is interrupted, the next run picks up where it left off instead of re-scoring everything.

interface AgentCheckpoint {
  sessionId: string;
  stepNumber: number;
  agentState: 'running' | 'completed' | 'failed';
  input: {
    task: string;
    context: Record<string, unknown>;
  };
  output: {
    result: string;
    confidence: number;
    metadata: Record<string, unknown>;
  };
  parentStep: string | null;
  createdAt: Date;
}

The key insight is the parentStep field. It creates a tree of actions, not just a flat list. If the agent backtracks or retries, you can trace the exact path it took. This is critical for debugging and for recovery.

In practice, I write a checkpoint before and after every LLM call. The cost is one database write per step. The benefit is that a crashed agent recovers in seconds instead of restarting from scratch.

Pattern 2: Vector Stores for Long-Term Memory

Checkpoints handle short-term task state. But agents also need to remember things across sessions. Preferences. Past decisions. Recurring patterns.

This is where vector stores earn their keep.

In the AI resume tailor, the agent needs to remember a user's base resume content across multiple tailoring jobs. Instead of reloading the full document into each prompt, I store embeddings of the resume sections and retrieve only what's relevant for each new job.

import { OpenAIEmbeddings } from '@langchain/openai';
import { PineconeStore } from '@langchain/pinecone';

const embeddings = new OpenAIEmbeddings({
  model: 'text-embedding-3-small',
});

const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
  pineconeIndex: 'agent-memory',
});

// Store a memory
await vectorStore.addDocuments([
  {
    pageContent: 'User prefers concise bullet points over paragraphs',
    metadata: {
      sessionId: 'user_abc',
      type: 'preference',
      source: 'observation',
    },
  },
]);

// Retrieve relevant memories for a new task
const memories = await vectorStore.similaritySearch(
  'What format does this user prefer?',
  3
);

The trick is to be deliberate about what you store. Not every observation. Not every output. Store decisions, preferences, and facts that have reuse value. Filter aggressively at write time, not read time.

Pattern 3: The Hybrid Architecture

Here's what I actually run in production. It combines both patterns into a single agent loop.

Short-term task state lives in a structured checkpoint log. The agent reads its latest checkpoint on startup and resumes from there. No in-memory state to lose.

Long-term knowledge lives in a vector store. The agent queries it at the start of each new task and appends the results to its system prompt. This keeps the context window focused on what's immediately relevant.

The agent also writes new observations back to the vector store after each task completes, so the memory grows over time.

async function agentLoop(task: Task, sessionId: string) {
  // Resume from last checkpoint
  const checkpoint = await loadCheckpoint(sessionId);
  const state = checkpoint?.state ?? {};

  // Load relevant long-term memories
  const memories = await retrieveMemories(task.description, 5);
  const contextPrompt = buildPrompt(task, state, memories);

  // Execute the step
  const result = await llmCall(contextPrompt);

  // Save checkpoint
  await saveCheckpoint({
    sessionId,
    stepNumber: (checkpoint?.stepNumber ?? 0) + 1,
    agentState: 'running',
    input: { task: task.description, context: state },
    output: { result, confidence: result.confidence, metadata: {} },
    parentStep: checkpoint?.id ?? null,
    createdAt: new Date(),
  });

  // Store new observations
  if (result.observations?.length) {
    await storeMemories(sessionId, result.observations);
  }

  return result;
}

This loop is simple. That's the point. Complexity kills agent reliability faster than anything else.

The Production Reality

Context persistence is not a feature you add later. It's a structural decision you make on day one.

The job board platform's scoring pipeline runs thousands of LLM calls daily. Without checkpointing, a single crash would lose hours of work. Without the vector store, the system would re-analyze the same listing patterns repeatedly. The cost would be absurd.

The agent that crashed on step 12? I rebuilt it with this architecture. It hasn't lost a session since. If your team is wrestling with agents that lose context mid-task and shipping slower because of it, that's the kind of thing I help with. Happy to compare notes on what works at scale.

Written by Abdul Rehman, full-stack AI engineer building production SaaS, MVPs, and AI automation. More at PrimeStrides.