Adding to Your On-Device Agent Long-Term Memory: Integrating IndexedDB into Chrome’s Prompt API

#promptapi #aiforwebdevelopment #agents #ai

In my previous post, we explored how to build a stateless, LangChain-style agent architecture using the Chrome Prompt API and Web Workers. We successfully offloaded the “heavy lifting”, the ReAct reasoning loop, JSON parsing, and tool execution, into background threads to keep the UI smooth.

But there was a catch. Our agent was essentially a “goldfish.” Every time you called runAgent, it started with a blank slate. It had no recollection of the previous turn, the user's preferences, or the results of a tool call from five minutes ago. To build truly sophisticated web-based AI tools, we need Persistence.

Today, let’s evolve that architecture by adding a memory layer using the browser’s native IndexedDB , and see why this is the missing piece for truly autonomous in-browser agents.

The Challenge: Where to Store the Past?

When working with Web Workers, your storage options are surprisingly limited. You cannot use localStorage or sessionStorage effectively because they are synchronous and the can’t be used inside a web worker.

The solution is IndexedDB.

IndexedDB is asynchronous, transactional, and crucially accessible from both the main thread and web workers. This allows our agent to persist its conversation history directly from the background thread, without ever interrupting the UI’s main execution loop.

The Architectural Change

To integrate memory, we need to move from a stateless “request-response” model to a “session-based” model.

The Memory Layer: A dedicated class to handle the IndexedDB transaction logic.
Context Loading: Before the ReAct loop starts, the worker fetches the history associated with a specific sessionId and injects it into the prompt.
Incremental Updates: After each successful agent run, the worker commits the updated conversation state back to the database.

The Memory Manager

Here is the implementation of our AgentMemory class. We keep this generic so it can be used across different agent implementations.

class AgentMemory {
    constructor(dbName = "AgentMemoryDB", storeName = "conversations") {
        this.dbName = dbName;
        this.storeName = storeName;
        this.db = null;
    }

    init() {
        return new Promise((resolve, reject) => {
            const request = indexedDB.open(this.dbName, 1);

            request.onupgradeneeded = (e) => {
                const db = e.target.result;
                if (!db.objectStoreNames.contains(this.storeName)) {
                    // Keyed by sessionId (e.g., "session_123")
                    db.createObjectStore(this.storeName, { keyPath: "sessionId" });
                }
            };

            request.onsuccess = (e) => {
                this.db = e.target.result;
                resolve();
            };

            request.onerror = (e) => reject(e.target.error);
        });
    }

    getHistory(sessionId) {
        return new Promise((resolve) => {
            const tx = this.db.transaction(this.storeName, "readonly");
            const store = tx.objectStore(this.storeName);
            const request = store.get(sessionId);

            request.onsuccess = () => {
                resolve(request.result ? request.result.history : []);
            };
            request.onerror = () => resolve([]);
        });
    }

    saveHistory(sessionId, history) {
        return new Promise((resolve, reject) => {
            const tx = this.db.transaction(this.storeName, "readwrite");
            const store = tx.objectStore(this.storeName);
            const request = store.put({ sessionId, history });

            request.onsuccess = () => resolve();
            request.onerror = (e) => reject(e.target.error);
        });
    }
}

Updating the ReAct Loop

The true power of this implementation lies in how we bridge the memory layer with the Prompt API. We don’t just “dump” the history into the prompt. We manage it as part of the context window.

In our runReActLoop, we fetch the history and include it in the context string. This gives the Gemini Nano model a "working memory" of what happened previously.

async function runReActLoop(userPrompt, sessionId) {
        let isComplete = false;
        let finalResult = "";
        let loopCount = 0;

        // Load historical turns from IndexedDB entirely in the background
        const historyTurns = await memory.getHistory(sessionId);

        const toolDescriptions = toolsArray.map(t => `- ${t.name}: ${t.description}`).join('\n');

        // Inject instructions, available tools, and historical conversation
        let context = `System: You are an AI agent with long-term memory. Think step-by-step.
Available tools:
${toolDescriptions}
- none: Use this if you do not need a tool and can answer the user directly.

Rules:
1. If you need data, set "toolName" to a tool and "toolInput" to the query. Leave "finalAnswer" as "".
2. If you know the answer, set "toolName" to "none" and put the answer in "finalAnswer".

Prior Conversation History:
${historyTurns.length > 0 ? historyTurns.join('\n') : "No prior history."}

Current Turn:
User: ${userPrompt}\n`;

        // Local turn buffer to record the ongoing ReAct execution sequence
        let currentTurnLog = `User: ${userPrompt}\n`;

        while (!isComplete && loopCount < 7) {
            loopCount++;

            const responseText = await askLLM(`${context}\nOutput your next step as JSON:`);
            let response;

            try {
                response = JSON.parse(responseText);
            } catch (e) {
                context += `Observation: Invalid JSON format. Please output strictly valid JSON.\n`;
                continue;
            }

            if (response.thought) {
                logToMain(`Thought: ${response.thought}`);
                currentTurnLog += `Thought: ${response.thought}\n`;
            }

            if (response.finalAnswer && response.finalAnswer.trim() !== "") {
                finalResult = response.finalAnswer;
                currentTurnLog += `Assistant: ${response.finalAnswer}\n`;
                isComplete = true;
            }
            else if (response.toolName && response.toolName !== "none" && toolsMap.has(response.toolName)) {
                logToMain(`Action: Running ${response.toolName} with input "${response.toolInput}"`);

                try {
                    const tool = toolsMap.get(response.toolName);
                    const toolResult = await tool.executeFn(response.toolInput);

                    const actionStr = `Action: ${response.toolName}("${response.toolInput}")\nObservation: ${toolResult}\n`;
                    context += actionStr;
                    currentTurnLog += actionStr;

                    logToMain(`Observation: ${toolResult}`);
                } catch (err) {
                    context += `Observation: Tool failed with error: ${err.message}\n`;
                }
            }
            else if (response.toolName === "none" || response.toolName === "") {
                context += `Observation: You selected no tools, but didn't provide a finalAnswer. Please provide the final answer.\n`;
            }
            else {
                context += `Observation: Tool '${response.toolName}' does not exist. Use an available tool or 'none'.\n`;
            }
        }

        // Persist updated history back to IndexedDB before wrapping up
        if (finalResult) {
            historyTurns.push(currentTurnLog.trim());
            // Optional: keep the sliding window under control (e.g., last 10 full turns)
            if (historyTurns.length > 10) historyTurns.shift();
            await memory.saveHistory(sessionId, historyTurns);
        }

        return finalResult || "Error: Reached maximum iterations.";
    }

    // Handle incoming commands
    self.addEventListener('message', async (e) => {
        const { id, type, payload } = e.data;

        if (type === 'llm_response') {
            resolvers.get(id)?.resolve(payload);
            resolvers.delete(id);
        } else if (type === 'llm_error') {
            resolvers.get(id)?.reject(new Error(payload));
            resolvers.delete(id);
        } else if (type === 'start_loop') {
            try {
                // Ensure IndexedDB is initialized before running the ReAct loop
                await memory.init();
                const answer = await runReActLoop(payload.userPrompt, payload.sessionId);
                self.postMessage({ id, type: 'agent_complete', payload: answer });
            } catch (err) {
                self.postMessage({ id, type: 'agent_error', payload: err.message });
            }
        }
    });

The Performance Trade-off

One architectural concern here is Context Window Inflation. If a user chats with your agent for a lot of time, the conversation history will eventually exceed the token limit of the on-device model.

In the implementation above, I added a simple sliding window logic:

// Optional: keep the sliding window under control (e.g., last 10 full turns)
if (historyTurns.length > 10) historyTurns.shift();
await memory.saveHistory(sessionId, historyTurns);

Summary

By delegating the memory management to the worker thread and using IndexedDB, we have achieved three critical goals:

Thread Safety: The UI thread knows nothing about the storage mechanism. It simply passes a sessionId.
Resilience: Because the history is persisted in IndexedDB, the agent “remembers” the conversation even after a page refresh.
Isolation: By using a sessionId, we can support multi-tenant agents in a single browser tab. Users can switch between different chat contexts without conflict.

This is exactly how enterprise-grade browser applications should handle AI state. We aren’t just building a chatbot, we are building an autonomous system that respects the browser’s constraints while delivering a sophisticated, stateful experience.

If you are interested in the code, you can find it on my Github — https://github.com/gilf/prompt-chain.