DEV Community

Cover image for I Built ContextFabric: One Private Memory Layer Across Claude, ChatGPT, Cursor, and More with Local Gemma 4
Tombri Bowei
Tombri Bowei Subscriber

Posted on

I Built ContextFabric: One Private Memory Layer Across Claude, ChatGPT, Cursor, and More with Local Gemma 4

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

AI tools remember now, but they remember in separate silos. Claude has projects, ChatGPT has personalization, Cursor indexes your codebase, and somehow you still end up re-explaining the same decisions, constraints, preferences, and project state every time you move between tools.

That felt backwards to me.

If memory is becoming part of the AI operating system, then personal context should not be trapped inside one vendor's product. It should be portable, permissioned, local-first, and owned by the user.

So I built ContextFabric: a local AI memory layer powered by Gemma 4.

What I Built

ContextFabric is a desktop app, local daemon, memory graph, and browser extension bridge that lets AI tools share approved context without sending your personal memory to a cloud memory server.

The idea is simple:

  1. Import your real project context: repos, folders, markdown, PDFs, ChatGPT exports, Claude exports, notes, and documents.
  2. Gemma 4 runs locally through Ollama and extracts structured memory nodes.
  3. ContextFabric stores those nodes in a local SQLite graph.
  4. External tools request access.
  5. You approve the request.
  6. The browser extension injects the right context into Claude, ChatGPT, Cursor, Gemini, Perplexity, and other AI tools.

The five core memory node types are:

  • project: what you are building
  • decision: choices already made and why
  • preference: stable working preferences
  • style: how you communicate, design, or code
  • person: collaborators and relevant human context

This is not meant to replace Claude projects, ChatGPT memory, or Cursor indexing.

It solves a different problem: your context should be portable across them.

Demo

Browser extension injection:

The demo shows the full loop:

  • paste messy project context
  • Gemma 4 extracts structured memory nodes
  • nodes are saved locally with confidence scores
  • AI Query answers with sources
  • a permission request controls external access
  • the browser extension injects approved context into an AI chat tool

Code

GitHub:

https://github.com/Boweii22/ContextFabric

Live Site:

https://boweii22.github.io/ContextFabric/

The project is built with Electron, React, TypeScript, SQLite, Express, Ollama, and a Manifest V3 browser extension.

The local app exposes two loopback APIs:

  • 127.0.0.1:47821 for the desktop app permission/token API
  • 127.0.0.1:7749 for the simple demo daemon UI and compatibility endpoints

Both are bound to loopback, not 0.0.0.0.

That matters because the privacy claim is not just a paragraph in a README. The architecture does not expose a public server for your memory graph.

How I Built It

The architecture has six parts:

User-owned sources
repos, exports, docs, notes, PDFs
        |
        v
Local ingestion
chunking + metadata
        |
        v
Gemma 4 via Ollama
extract + reason
        |
        v
SQLite memory graph
nodes + embeddings
        |
        v
Permissioned daemon
localhost only
        |
        v
Browser extension
injects context
Enter fullscreen mode Exit fullscreen mode

The first hard problem was extraction.

I did not want a generic summary. I wanted durable memory. That means the model has to decide whether a piece of text contains a project fact, a decision, a preference, a style signal, or a person.

Here is the actual extraction schema prompt from the project:

export const CONTEXT_NODE_TYPES = ['project', 'style', 'decision', 'preference', 'person'] as const

export const CONTEXT_EXTRACTION_SYSTEM_PROMPT = `You are ContextFabric's local Gemma 4 context extractor.

Your job is to read one piece of user-owned context and output ONLY valid JSON.
No markdown. No prose. No comments. No trailing commas.

Extract durable context nodes that another AI assistant should remember later.
Use only facts supported by the input. Do not invent people, projects, tools, or decisions.

Allowed node types:
- project: what the user is building, maintaining, researching, or planning.
- style: how the user writes, communicates, designs, codes, or prefers answers to be shaped.
- decision: a choice already made, including why, tradeoffs, rejected alternatives, or reversibility.
- preference: a stable working preference, constraint, tool choice, privacy preference, format preference, or habit.
- person: a collaborator, stakeholder, user, client, author, or named human with relevant relationship/role context.

Return this exact JSON shape:
{
  "nodes": [
    {
      "type": "project" | "style" | "decision" | "preference" | "person",
      "title": "short human-readable title",
      "summary": "one factual sentence, max 220 characters",
      "confidence": 0.0,
      "evidence": "short direct evidence phrase from the input, max 180 characters",
      "entities": ["important names, tools, projects, people"],
      "tags": ["lowercase-keywords"]
    }
  ]
}`
Enter fullscreen mode Exit fullscreen mode

The parser is intentionally defensive. Gemma 4 is good at structured output, but production code still needs repair paths.

export function parseContextExtraction(raw: string): ContextExtractionParseResult {
  const errors: string[] = []
  const parsed = parseJsonObject(raw)

  if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
    return { ok: false, result: { nodes: [] }, errors: ['Output is not a JSON object.'] }
  }

  const root = parsed as Record<string, unknown>
  if (!Array.isArray(root.nodes)) {
    return { ok: false, result: { nodes: [] }, errors: ['Missing nodes array.'] }
  }

  const nodes: ExtractedContextNode[] = []
  for (const [index, value] of root.nodes.entries()) {
    const node = normalizeNode(value, index, errors)
    if (node) nodes.push(node)
  }

  return { ok: errors.length === 0, result: { nodes: nodes.slice(0, 6) }, errors }
}
Enter fullscreen mode Exit fullscreen mode

The second hard problem was assembling context for different tools.

Claude, ChatGPT, and Cursor do not want the same payload. Claude benefits from concise prose sections. ChatGPT works well with a compact bullet brief. Cursor needs engineering-focused context.

So ContextFabric asks Gemma 4 to assemble app-aware context briefs:

export const PAYLOAD_ASSEMBLY_SYSTEM_PROMPT = `You are ContextFabric's local Gemma 4 payload assembler.

Goal:
Turn user-approved local memory nodes into one coherent context brief for another AI tool.

Rules:
- Use ONLY the supplied memory nodes. Do not invent facts, names, features, dates, metrics, or claims.
- Prefer stable project, decision, style, preference, and person nodes over raw conversation/code snippets.
- Write a useful brief, not a JSON dump.
- Include source node ids inline as [node:id] after concrete claims.
- If the nodes do not support a requested claim, omit it.
- Respect the requested app format.
- Stay under the requested maximum word count.

App formats:
- claude: concise prose with sections "Context", "Decisions", "Working Style", "How to Use This".
- chatgpt: short bullet-oriented brief with "Known Context", "Preferences", "Relevant Sources".
- cursor: engineering-focused brief with "Project", "Architecture / Decisions", "Coding Preferences", "Files / Sources".
- generic: compact neutral brief with clear source ids.

Return JSON only:
{
  "payload": "the final context brief",
  "usedNodeIds": ["node-id"],
  "warnings": ["optional warning when data is thin or uncertain"]
}`
Enter fullscreen mode Exit fullscreen mode

The third hard problem was making the local model usable on normal hardware.

I hit memory issues while testing Gemma locally, so ContextFabric creates a constrained Ollama profile called cf-gemma4.

const res = await fetch(`${this.baseUrl}/api/create`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: this.constrainedModelName,
    from: sourceModel,
    parameters: {
      num_ctx: this.runtimeContext,
      num_predict: 64,
      num_batch: 4,
    },
    stream: false,
  }),
  signal: ctrl.signal,
})
Enter fullscreen mode Exit fullscreen mode

This was not about making the model weaker.

It was about making the demo run on real laptops, not just on a perfect GPU workstation.

For the local HTTP daemon, I added a small API that judges can test without understanding the whole Electron app:

compat.post('/extract', async (req: Request, res: Response) => {
  const { text, title = 'HTTP Extract', inputType = 'api', save = false } = req.body
  if (!text?.trim()) {
    res.status(400).json({ error: 'text is required' })
    return
  }

  const result = await extractNodesFromText(db, ollama, text, title, inputType, Boolean(save))
  res.json({
    ok: true,
    saved: Boolean(save),
    savedCount: result.savedCount,
    nodes: result.nodes.map(nodeToPublicJson),
  })
})

compat.get('/context', async (req: Request, res: Response) => {
  const appId = String(req.query.app || req.query.appId || 'generic')
  const query = String(req.query.query || 'current project context, writing style, technical decisions, preferences')
  const nodes = selectTokenNodes(db.getNodes(800), query, 16)
  const assembly = await assembleTokenPayloadWithTimeout(ollama, { appId, query, nodes, maxWords: 800 })
  res.json({ ok: true, appFormat: assembly.appFormat, payload: assembly.payload })
})
Enter fullscreen mode Exit fullscreen mode

That endpoint is what makes the browser extension bridge simple. The extension does not need to know how the graph works. It asks the local daemon for approved context and inserts it into the active AI chat box.

Why Gemma 4

Gemma 4 is not a decorative dependency here.

It is the part of the system that turns ContextFabric from a searchable note bucket into a memory protocol.

I chose Gemma 4 E2B as the target model profile because ContextFabric is supposed to run where personal context actually lives: on laptops, desktops, and eventually smaller edge devices.

A cloud model would have defeated the core privacy constraint. If your private context graph has to leave the machine for extraction, then the product becomes a privacy policy promise instead of a privacy-preserving architecture.

A much larger local model could produce stronger answers, but it would make the product less usable for the people who need it most. The challenge specifically highlights small Gemma 4 models for edge and local use, and that is exactly the design space ContextFabric lives in.

Gemma 4 plays three roles:

1. Context extraction

It reads messy user-owned text and converts it into typed, durable memory nodes.

This is different from summarization. A summary says "what was this text about?" Context extraction asks "what should another AI assistant remember later?"

2. Conflict detection

If a new memory contradicts an existing one, Gemma 4 can mark the conflict or uncertainty. That matters because memory should not silently rot.

For example, if an old preference says "prefer short answers" and a new note says "prefer detailed long answers", ContextFabric should surface that conflict instead of pretending both are equally true forever.

3. Payload assembly

When Claude, ChatGPT, or Cursor asks for context, Gemma 4 turns relevant graph nodes into a coherent brief with citations and a word limit.

This is where the model's reasoning is useful: not to invent project facts, but to decide how to package approved facts for another tool.

The architecture also keeps Gemma 4 on the correct side of the trust boundary.

The normal challenge path uses Ollama locally. The daemon binds to loopback. The database is local. The browser extension talks to localhost. There is no ContextFabric cloud memory service receiving your data.

That is the difference between "we care about privacy" and "the data path cannot reach our server because there is no server in the path."

The Bigger Picture

I do not think the long-term version of this idea is just an app.

I think it is protocol infrastructure.

HTTP made documents portable across servers. SMTP made email portable across providers. ContextFabric is an early sketch of what a personal AI context protocol could look like.

Today, every AI company is building memory as a product feature. That makes sense. Memory improves retention.

But as developers, we should ask a harder question:

Should personal AI context belong to the tool, or to the user?

My answer is the user.

That is why ContextFabric has permission requests, scoped grants, source citations, local storage, and a browser extension bridge. The extension is the adoption wedge: it makes the protocol useful before any AI company agrees to support it natively.

That was the "I never thought of it that way" moment for me.

The future of AI memory should not be one giant memory per vendor. It should be a user-controlled context layer that tools can request access to.

The browser extension is the wedge.

The protocol is the point.

Challenges I Ran Into

The hardest challenge was not building a chat UI.

It was keeping the system honest.

Early versions returned raw code chunks when I asked project-level questions. That was technically "retrieval", but it was bad memory. I had to improve ranking so durable nodes like project, decision, style, and preference win over random bundled JavaScript or CSS.

The second challenge was local model reliability. Gemma 4 needs enough free memory, and normal laptops are messy. People have Chrome, VS Code, Docker, Discord, and ten other things open.

That led to the constrained Ollama profile, shorter prompts, fallback parsing, and clearer error messages.

The third challenge was browser injection. Claude, ChatGPT, Cursor, and Perplexity do not share one DOM structure. The extension has to find active inputs, avoid stale text areas, handle single-page-app navigation, and never crash the page if the daemon is offline.

The fourth challenge was packaging. A project that only works on my machine is not a challenge submission. I added a one-command startup path, release assets, Chrome extension packaging, screenshots, and a GitHub Pages landing page.

What's Next

The next version is about turning the prototype into a real protocol.

My roadmap:

  • publish the Chrome Web Store listing after review
  • add native macOS and Windows installers
  • improve LAN sync between devices
  • add richer conflict resolution workflows
  • publish a formal context payload schema
  • build SDKs so indie AI tools can request ContextFabric memory directly
  • explore a standard token format for scoped context grants

The browser extension is useful now, but the bigger win is native integration.

I want AI tools to request context the way apps request OAuth scopes, except the resource is not your Google Drive or GitHub account. It is your personal working context.

Try It Yourself

Repo:

https://github.com/Boweii22/ContextFabric

Live Site:

https://boweii22.github.io/ContextFabric/

Install Ollama:

https://ollama.com

Then run:

git clone https://github.com/Boweii22/ContextFabric.git
cd ContextFabric
npm run start
Enter fullscreen mode Exit fullscreen mode

On macOS, use Node 20 or 22:

nvm install 20
nvm use 20
npm run start
Enter fullscreen mode Exit fullscreen mode

On Windows, use Node 20 via fnm or nvm-windows:

fnm use 20
npm run start
Enter fullscreen mode Exit fullscreen mode

Open the local demo UI:

http://127.0.0.1:7749/ui
Enter fullscreen mode Exit fullscreen mode

The fastest test:

  1. Paste some project context into Extract Context.
  2. Click Extract and save.
  3. Watch Gemma 4 create typed memory nodes.
  4. Open the Claude context preview.
  5. Try the browser extension bridge.

Built by Bowei Tombri for the DEV Gemma 4 Challenge.

If you build with AI tools every day, I am curious: would you rather each tool keep its own memory, or would you prefer a local memory layer that every tool has to request permission from?


Top comments (0)