"Agentic AI" has become a marketing term that means everything and nothing. Most "agentic" tools are just language models with tool use, where the tools make API calls to cloud services, and the orchestration happens on some company's server.
I wanted to build something genuinely different: an agentic assistant where the entire execution — the model inference, the tool calls, the memory — happens on your device.
That's Pocket guIDE.
What "agentic" actually means here
Pocket guIDE can execute multi-step tasks autonomously. Given a goal, it will:
- Break the goal into steps
- Execute each step (searching, reading, writing, calculating)
- Use the output of each step to inform the next
- Return a consolidated result
The key distinction from a chatbot: it doesn't just respond to your message. It acts on it over multiple iterations until the task is done.
What tools does it have access to?
The agent has access to:
- Web search — via a local search proxy, not through a cloud AI service
- Calculator / code execution — runs JavaScript expressions in a sandboxed worker
-
Note-taking / memory — persists context between sessions in local storage (
E:\pocketguidestorage) - Document reading — can process PDFs and text files you drop in
- Conversation mode — standard back-and-forth when you don't need multi-step execution
All this happens in the browser. The model inference runs on your CPU/GPU via WebAssembly-compiled llama.cpp (or via a local llama.cpp server if you want faster responses).
The privacy story
Because inference and tool execution are local:
- Your conversations aren't logged anywhere — no company has a history of what you've asked
- Your files stay on your machine — documents you analyze never leave your device
- No API key exposure — there's no key to leak or rotate
- Works offline — no internet required for the AI components
The honest trade-offs
Running everything locally means your model is bounded by your hardware. A WebAssembly-compiled 3B model running in the browser is noticeably slower and less capable than GPT-4 over an API.
For tasks that need frontier model capability — complex reasoning, very long context — local models aren't there yet. For everyday assistant tasks, research summaries, and multi-step workflows on local documents, they're surprisingly capable.
The experience is better when running Pocket guIDE with a local llama.cpp server (accessible at localhost) rather than full WASM inference. The server can use your GPU and the larger quantized models.
The architecture
The frontend is a React application that communicates with a local inference backend (llama.cpp server or WASM). The agent loop is implemented in TypeScript:
async function runAgent(goal: string, maxIterations = 10) {
let context = { goal, steps: [], result: null };
for (let i = 0; i < maxIterations; i++) {
const action = await model.decide(context);
if (action.type === 'finish') return action.result;
const output = await tools[action.type].execute(action.params);
context.steps.push({ action, output });
}
}
The tool system is pluggable — adding a new tool means implementing a simple interface and registering it with the agent.
Try it
Pocket guIDE runs in any modern browser. For the best experience, run a local llama.cpp server alongside it.
Built with: React, TypeScript, llama.cpp (WASM + local server), WebAssembly, IndexedDB
Top comments (0)