Beyond the Context Window: Simulating True AI Memory with Ollama and AIsa.one

#programming #ai #javascript #beginners

We often confuse "Context" with "Memory" in LLMs.

When you paste a 100-page PDF into an LLM, you aren't giving it memory; you're giving it a very long short-term memory. True memory isn't about stuffing everything into the prompt - it's about state persistence, emotional continuity, and the ability to recall specific facts without needing them constantly repeated.

I built LongMind, a proof-of-concept demo to visualize exactly how Memory and Context differ in AI behavior. Here is how it works.

The Architecture

The stack is simple but effective:

Backend: Node.js + Express
Frontend: React (Vite)
AI Engine: Ollama (Local Llama 3.2) OR AIsa.one (Cloud)
Memory: A simple JSON store (no vector DB complexity needed for this demo)

The 3 Modes of Cognition

The core feature of LongMind is the ability to toggle between three inference strategies:

Context Only (Amnesia)
The LLM sees only the current message. It has no idea who you are or what happened 5 minutes ago.
Result: You betray the NPC, and 10 seconds later, he greets you warmly.
Memory Only (Rigid Retrieval)
The LLM sees only the persisted facts, ignoring the current conversational nuance.
Result: You say "Hello", and the NPC ignores the greeting to rant about a past betrayal. This mimics bad RAG implementations where retrieval overpowers flow.
Memory + Context (The Holy Grail)
The LLM sees both. It integrates the past (Memory) with the present (Context).
Result: You apologize. The NPC hears the apology (Context) but refuses it because he remembers the betrayal (Memory). This feels "human."

The Code: LLM Abstraction

We created a unified generateResponse function that alters the prompt engineering based on the selected mode:

// server/llm/provider.js
async function generateResponse({ systemPrompt, memory, userMessage, mode, provider }) {
    const messages = [{ role: "system", content: systemPrompt }];

    // Inject Memory if applicable
    let memoryText = "";
    if (memory && memory.length > 0) {
        memoryText = "Trusted Memories:\n" + memory.map(m => `- ${m}`).join("\n");
        if (mode !== 'context_only') {
             messages.push({ role: "system", content: `RELEVANT MEMORIES:\n${memoryText}` });
        }
    }

    // Inject User Context only if NOT in 'Memory Only' mode
    if (mode !== 'memory_only' && userMessage) {
        messages.push({ role: "user", content: userMessage });
    } else if (mode === 'memory_only') {
        // Force reaction to memory even without input
        messages.push({ role: "system", content: "React solely to your memories." });
    }
    // ... Call Ollama or AIsa
}

This simple demo illustrates why "Context Windows" aren't a silver bullet. You can have a 1M token context, but if you treat it as a scratchpad, you get drift. True agentic behavior requires a persistent "Self" that exists outside the inference cycle.

Check out the code on GitHub to run your own local NPC with Llama 3.2!