How I built Glia: A local-first shared memory layer for browser chats and IDEs

Eshaan — Mon, 18 May 2026 11:42:18 +0000

Have you ever spent 15 minutes re-explaining your database schema, architecture, or codebase decisions to Claude or ChatGPT because your chat limit was reached or you opened a new session?

If you use AI web chats and IDE coding agents (Cursor, Windsurf, Claude Code), this constant context loss is the biggest friction point in daily development.

To solve this, I built Glia—a 100% offline, local-first shared memory layer that links your browser chats and IDE coding tools using a unified local database.

Website: https://glia-ai.vercel.app/
Codebase: https://github.com/Eshaan-Nair/Glia-AI

The Core Problem: Context Amnesia AI assistants are brilliant, but they live in isolated silos. When you chat on Claude.ai web, that planning is invisible to your Cursor agent. When your browser session resets, your context is wiped clean.

Glia acts as a shared local brain. You save your web chat with a browser extension, and it is instantly recalled by your coding agents inside the IDE.

The Architecture: Built for Local Speed & Absolute Privacy I wanted a lightweight tool that runs entirely offline with zero cloud fees and absolute security.

Here is how Glia works under the hood:

Chrome MV3 Extension Runs on Claude, ChatGPT, DeepSeek, Gemini, Grok, Mistral, and Copilot. It intercepts SPA navigations to prevent context bleeding, performs local PII scrubbing (redacting credentials, emails, and tokens), and automatically injects relevant context into your prompt text box before submission.

Express & SQLite Backend Glia runs a local server that uses sqlite-vec for 768-dimension local vector search (using nomic-embed-text via Ollama) and SQLite FTS5 for literal keyword prefix matching. They run in parallel, and the scores are fused.

Surgical Context Trimming Traditional RAG dumps massive text blocks that bloat prompts. Glia indexes text at the individual sentence level. On recall, it extracts only the specific sentences that match your query. In benchmarks, this surgical approach cuts prompt noise by up to 95%.

Local Knowledge Graph A background queue uses a local LLM (llama3.1:8b via Ollama) to extract entity triples (subject-relation-object) to build an offline knowledge graph, visualizable in an interactive React + D3.js dashboard served locally.

Model Context Protocol (MCP) Server An integrated MCP server exposes tools like recall_context, store_memory, and get_project_summary to Cursor, Windsurf, and Claude Code over stdio, allowing coding agents to manage project context autonomously.

Zero-Docker Setup If you want to try it out, you can set up the SQLite engine, Chrome extension, and MCP server in a single command:

npx glia-ai-setup

Glia is completely open-source (MIT). Check out the codebase, leave a star if you like it, and let me know your thoughts on the local-first vector search architecture!

GitHub: https://github.com/Eshaan-Nair/Glia-AI
Website: https://glia-ai.vercel.app/

DEV Community: Eshaan

How I built Glia: A local-first shared memory layer for browser chats and IDEs