Building an agentic AI assistant that runs entirely in your browser with no cloud required

#showdev #ai #programming #javascript

"Agentic AI" has become a marketing term that means everything and nothing. Most "agentic" tools are just language models with tool use, where the tools make API calls to cloud services, and the orchestration happens on some company's server.

I wanted to build something genuinely different: an agentic assistant where the entire execution — the model inference, the tool calls, the memory — happens on your device.

That's Pocket guIDE.

What "agentic" actually means here

Pocket guIDE can execute multi-step tasks autonomously. Given a goal, it will:

Break the goal into steps
Execute each step (searching, reading, writing, calculating)
Use the output of each step to inform the next
Return a consolidated result

The key distinction from a chatbot: it doesn't just respond to your message. It acts on it over multiple iterations until the task is done.

What tools does it have access to?

The agent has access to:

Web search — via a local search proxy, not through a cloud AI service
Calculator / code execution — runs JavaScript expressions in a sandboxed worker
Note-taking / memory — persists context between sessions in local storage (E:\pocketguidestorage)
Document reading — can process PDFs and text files you drop in
Conversation mode — standard back-and-forth when you don't need multi-step execution

All this happens in the browser. The model inference runs on your CPU/GPU via WebAssembly-compiled llama.cpp (or via a local llama.cpp server if you want faster responses).

The privacy story

Because inference and tool execution are local:

Your conversations aren't logged anywhere — no company has a history of what you've asked
Your files stay on your machine — documents you analyze never leave your device
No API key exposure — there's no key to leak or rotate
Works offline — no internet required for the AI components

The honest trade-offs

Running everything locally means your model is bounded by your hardware. A WebAssembly-compiled 3B model running in the browser is noticeably slower and less capable than GPT-4 over an API.

For tasks that need frontier model capability — complex reasoning, very long context — local models aren't there yet. For everyday assistant tasks, research summaries, and multi-step workflows on local documents, they're surprisingly capable.

The experience is better when running Pocket guIDE with a local llama.cpp server (accessible at localhost) rather than full WASM inference. The server can use your GPU and the larger quantized models.

The architecture

The frontend is a React application that communicates with a local inference backend (llama.cpp server or WASM). The agent loop is implemented in TypeScript:

async function runAgent(goal: string, maxIterations = 10) {
  let context = { goal, steps: [], result: null };

  for (let i = 0; i < maxIterations; i++) {
    const action = await model.decide(context);
    if (action.type === 'finish') return action.result;
    const output = await tools[action.type].execute(action.params);
    context.steps.push({ action, output });
  }
}