I write a lot in the browser — email, GitHub comments, contact forms — and I wanted proofreading without uploading every keystroke to a company's cloud. My workplace bans Grammarly for exactly that reason.
So I built inline-scribe: a Chrome extension that proofreads your text with an AI that runs on your own machine (Ollama). Nothing leaves your computer. And the fixes show up like Word's Track Changes — accept or reject each one individually with ✓ / ✕.
This post is about the two design decisions I found most interesting while building it. Both generalize to anyone wiring a local LLM into a product.
- The LLM never produces the diff.
-
Silencing Ollama's 403 with
declarativeNetRequest— zero-config, noOLLAMA_ORIGINS.
The missing ingredient was never the AI
If you write in a browser today, you pick one of three bad options:
- Grammarly — great UX, but every keystroke goes to their cloud, the good features are behind a subscription, and many workplaces ban it (legal docs, unreleased code, patient data).
- Paste into ChatGPT — you get one big rewritten blob back. Which words changed? Did it alter your meaning? You re-read everything, every time, and your text still went to someone else's server.
- Nothing — and ship the typos.
The thing is, the AI isn't the hard part anymore. Anyone can run a capable model locally with Ollama in two commands, for free. What's missing is the interface. The reason Grammarly was worth paying for was never the grammar engine — it was the friendly diff that lets you see and control each change.
That interface, on top of a model you own, is the whole product.
| corrections | your text goes to | inline diff, per-fix accept/reject | price | |
|---|---|---|---|---|
| Grammarly | cloud AI | their servers | ✅ (the reason people pay) | $12+/mo |
| Harper (10k★) | local, rule-based | nowhere ✅ | ❌ underlines typos only — can't rewrite a clumsy sentence | free |
| scramble / Typollama | local LLM ✅ | nowhere ✅ | ❌ whole-text replacement or popup | free |
| inline-scribe | local LLM ✅ | nowhere ✅ | ✅ | free |
Design decision #1: the LLM never produces the diff
This is the one I most want to share.
The intuitive move is to ask the model for structured output: "return the changes as JSON," something like [{ "delete": "...", "insert": "..." }, ...], and pipe it straight into the UI.
But small local models break when you do this. A model like llama3.2 (3B) is surprisingly good at fixing prose and terrible at structured output: it breaks the JSON, adds explanations, wraps everything in a code fence, renames your keys. A chatty 3B model means a broken UI.
So I split the responsibilities:
- The model's job: return corrected prose — just text.
-
The extension's job: compute the changes (hunks) from
(original, corrected)with a deterministic algorithm.
you press Alt+G in a text field
│
▼
the extension sends your text to YOUR endpoint ← default: Ollama on 127.0.0.1
(an OpenAI-compatible /chat/completions API) model: llama3.2 (~2GB, free)
│
▼
the model returns corrected prose — just text
│
▼
inline-scribe computes a word-level diff ← deterministic algorithm,
between your text and the correction NOT the LLM's opinion
│
▼
review panel: accept ✓ / reject ✕ each change → Apply writes back only what you approved
The diff tokenizes into words + whitespace + punctuation runs, then does an LCS (longest common subsequence) walk:
// Tokenize into words/whitespace/punctuation, preserving everything
export function tokenize(text: string): string[] {
return text.match(/\s+|[^\s\w]+|\w+/gu) ?? [];
}
export function diffText(original: string, corrected: string): Hunk[] {
const a = tokenize(original);
const b = tokenize(corrected);
// DP table of LCS lengths over a × b (Uint32Array rows)
// Walk the table emitting equal / delete / insert, merging adjacent ops.
// Collapse delete+insert neighbours into one `replace` so a phrase rewrite
// reads as a single reviewable hunk instead of three.
...
}
This split has a lot of happy side effects:
- Model-agnostic. Any OpenAI-compatible endpoint works (llama.cpp, LM Studio, vLLM, or your own key). Since nothing depends on structured-output quality, the UI behaves the same whether you run 3B or 70B.
- Deterministic, so reproducible. Same input/output → same hunks. Easy to unit-test.
-
Accept/reject is trivial. A hunk is
{ kind, original, corrected }. Accepted hunks takecorrected, rejected takeoriginal, concatenate — done.
export function applyDecisions(hunks: Hunk[], accepted: boolean[]): string {
let result = '';
hunks.forEach((h, idx) => {
if (h.kind === 'equal') result += h.original;
else result += accepted[idx] ? h.corrected : h.original;
});
return result;
}
Even with "return only text," small models still wrap output in fences or quotes. I gave up on prompting that away and instead strip the obvious wrappers in post — without touching real content:
export function stripWrapping(reply: string, original: string): string {
let out = reply.replace(/\r\n/g, '\n');
const fence = out.match(/^\s*```[a-z]*\n([\s\S]*?)\n```\s*$/);
if (fence) out = fence[1]; // strip ```...```
out = out.replace(/^\s+|\s+$/g, '');
if (/^".*"$/s.test(out) && !/^"/.test(original.trim())) out = out.slice(1, -1); // strip whole-reply quotes
if (original.endsWith('\n') && !out.endsWith('\n')) out += '\n'; // preserve trailing-newline convention
return out;
}
Takeaway: let small local models do what they're good at (return natural language) and keep the structure — diffs, JSON, state — in deterministic code. This isn't specific to proofreading; it's a general principle for putting a local LLM into a product.
Design decision #2: silencing Ollama's 403 with declarativeNetRequest
This is the pothole every Chrome-extension × Ollama project hits.
Stock Ollama rejects requests carrying a chrome-extension://... Origin with a 403 — a guard against cross-origin access from extensions. The official workaround is to have the user set the OLLAMA_ORIGINS env var. But asking for that means:
- exporting
OLLAMA_ORIGINS=chrome-extension://<id-that-changes> - different steps depending on shell config and how Ollama is launched
- the single biggest "I installed it and it doesn't work" drop-off point
In other words, the moment your README documents an env var, you've lost. It should just work with a stock ollama serve.
The fix: use MV3's declarativeNetRequest (DNR) to strip the Origin header from requests to the configured endpoint with a dynamic rule. No Origin, no 403.
async function syncOriginRule(): Promise<void> {
const stored = await chrome.storage.sync.get('config');
const endpoint = stored.config?.endpoint ?? DEFAULT_CONFIG.endpoint;
const host = new URL(endpoint).host; // e.g. 127.0.0.1:11434
await chrome.declarativeNetRequest.updateDynamicRules({
removeRuleIds: [1],
addRules: [{
id: 1,
priority: 1,
condition: { urlFilter: `||${host}/`, resourceTypes: ['xmlhttprequest'] },
action: {
type: 'modifyHeaders',
requestHeaders: [{ header: 'origin', operation: 'remove' }],
},
}],
});
}
Key points:
-
Scope the rule to the user's configured host only (
urlFilter). It's not a dangerous "strip Origin everywhere" rule. - The endpoint is configurable, so watch
chrome.storage.onChangedand re-apply the rule whenever config changes. - The only permissions needed are
declarativeNetRequestplus localhosthost_permissions.
// manifest.json (excerpt)
"permissions": ["storage", "activeTab", "declarativeNetRequest", "contextMenus"],
"host_permissions": ["http://127.0.0.1/*", "http://localhost/*"],
The result: the user's steps are "install Ollama, ollama serve, install the extension." Zero env vars.
MV3 architecture: do the fetch in the service worker
One more thing. The actual fetch (the request to 127.0.0.1) happens in the service worker, not the content script:
// content → background message; background runs the check and replies
chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
if (msg?.type !== 'inline-scribe:check') return undefined;
(async () => {
const config = { ...DEFAULT_CONFIG, ...(await chrome.storage.sync.get('config')).config };
try {
const corrected = await new OllamaChecker(config).check(msg.text);
sendResponse({ ok: true, corrected, model: config.model });
} catch (err) {
sendResponse({ ok: false, error: /* CheckerError message */ });
}
})();
return true; // keep the channel open for the async response
});
Two reasons:
-
It isn't bound by the page's CSP. A
fetchfrom a content script gets blocked when the page's Content-Security-Policy restrictsconnect-src. The service worker runs in the extension's context and is unaffected. -
It plays well with the DNR Origin strip. The request now originates from the extension's service worker as an
xmlhttprequest, so the rule above applies cleanly.
The UI (review panel, the ✎ selection icon, in-place replacement) is rendered in a shadow DOM from the content script so it doesn't collide with the page's CSS.
Wrapping up
inline-scribe is, at its core, "Grammarly's diff UX on top of a local LLM." The design decisions that made it work:
- Don't let the LLM build the diff — use a deterministic algorithm. The UI never breaks on small models, it's model-agnostic, and it's easy to test.
-
Strip the Origin with DNR. No
OLLAMA_ORIGINS, zero config. - Fetch in the service worker. Not bound by page CSP.
Swap the system prompt and the same diff UI becomes a translator or a tone-shifter. Source is MIT.
If you're putting a local LLM into a product, the leverage is in deciding what the model does — and what it doesn't.
Top comments (0)