We often confuse "Context" with "Memory" in LLMs.
When you paste a 100-page PDF into an LLM, you aren't giving it memory; you're giving it a very long short-term memory. True memory isn't about stuffing everything into the prompt - it's about state persistence, emotional continuity, and the ability to recall specific facts without needing them constantly repeated.
I built LongMind, a proof-of-concept demo to visualize exactly how Memory and Context differ in AI behavior. Here is how it works.
The Architecture
The stack is simple but effective:
- Backend: Node.js + Express
- Frontend: React (Vite)
- AI Engine: Ollama (Local Llama 3.2) OR AIsa.one (Cloud)
- Memory: A simple JSON store (no vector DB complexity needed for this demo)
The 3 Modes of Cognition
The core feature of LongMind is the ability to toggle between three inference strategies:
Context Only (Amnesia)
The LLM sees only the current message. It has no idea who you are or what happened 5 minutes ago.
Result: You betray the NPC, and 10 seconds later, he greets you warmly.Memory Only (Rigid Retrieval)
The LLM sees only the persisted facts, ignoring the current conversational nuance.
Result: You say "Hello", and the NPC ignores the greeting to rant about a past betrayal. This mimics bad RAG implementations where retrieval overpowers flow.Memory + Context (The Holy Grail)
The LLM sees both. It integrates the past (Memory) with the present (Context).
Result: You apologize. The NPC hears the apology (Context) but refuses it because he remembers the betrayal (Memory). This feels "human."
The Code: LLM Abstraction
We created a unified generateResponse function that alters the prompt engineering based on the selected mode:
// server/llm/provider.js
async function generateResponse({ systemPrompt, memory, userMessage, mode, provider }) {
const messages = [{ role: "system", content: systemPrompt }];
// Inject Memory if applicable
let memoryText = "";
if (memory && memory.length > 0) {
memoryText = "Trusted Memories:\n" + memory.map(m => `- ${m}`).join("\n");
if (mode !== 'context_only') {
messages.push({ role: "system", content: `RELEVANT MEMORIES:\n${memoryText}` });
}
}
// Inject User Context only if NOT in 'Memory Only' mode
if (mode !== 'memory_only' && userMessage) {
messages.push({ role: "user", content: userMessage });
} else if (mode === 'memory_only') {
// Force reaction to memory even without input
messages.push({ role: "system", content: "React solely to your memories." });
}
// ... Call Ollama or AIsa
}
This simple demo illustrates why "Context Windows" aren't a silver bullet. You can have a 1M token context, but if you treat it as a scratchpad, you get drift. True agentic behavior requires a persistent "Self" that exists outside the inference cycle.
Check out the code on GitHub to run your own local NPC with Llama 3.2!





Top comments (0)