DEV Community: Eduard Maghakyan

Mnemonic - local-first voice notes with Gemma 4 E4B

Eduard Maghakyan — Sat, 16 May 2026 23:43:24 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Mnemonic is a macOS menu-bar app and CLI for voice notes that go straight into your daily journal.

Press a hotkey, speak, release. One bullet appears in today's YYYY-MM-DD.md:

- 14:35 This is a new node. Let me try to see if it'll work. [audio](../audio/2026-05-10/143500.wav)
- 15:12 I want to email Sarah tomorrow about the migration plan. [audio](../audio/2026-05-10/151200.wav)
- 16:08 The bug is in how we handle the empty array case in `merge_chunks`. [audio](../audio/2026-05-10/160800.wav)

That's the whole product. No titles, no summaries, no auto-generated TODO lists, no extracted entities. No cloud, no telemetry. Transcribed thoughts dropped into a Markdown file you already control.

Early versions over-structured short voice memos - every 30-second thought came back with a title, a "summary" that restated what you said, and an "Actions" list that invented TODOs. v0.2 cut all of that. The model's job is now narrow: transcribe and lightly clean.

v0.3 added three things on top, each either opt-in or invisible by default:

Image attachments. Take a screenshot, then speak - or use Ctrl+Option+Cmd+Space to drag a region and start recording in one motion. Gemma 4 reads the WAV and the PNG together and produces one bullet referencing both. PNG saves next to the audio.
Recording queue. Recording is decoupled from structuring. Release the hotkey, the tray goes idle, fire the next one immediately. A background worker drains an on-disk inbox serially; quitting mid-job is safe, the inbox survives. Tray is now binary - gray idle, red recording.
Intent routing (opt-in, off by default). A second narrow Gemma 4 call decides whether your note is asking the OS to do something - "remind me to call Sarah at 3 PM" - and if it is, fires a macOS Shortcut you've whitelisted. Undoable for 5 seconds. No AppleScript, no shell interpolation.

The file format (YYYY-MM-DD.md at the vault root) matches Obsidian's Daily Notes plugin, so pointing notes_dir at an Obsidian vault makes bullets land in today's daily note. Graph view, backlinks, search - all free.

Everything runs locally. No network call leaves the loopback interface. The DMG is signed and notarized; no telemetry crates are linked into the binary.

Demo

Code

Repository: github.com/EduardMaghakyan/mnemonic
Latest release (signed + notarized DMG): v0.3.1
Install via Homebrew:

  brew tap EduardMaghakyan/tap
  brew install --cask mnemonic

Rust workspace: Tauri 2 for the menu-bar app, clap for the CLI, a shared mnemonic-core crate for audio + markdown + the llama-server client. Single Apple Silicon DMG, code-signed with a Developer ID, notarized, and stapled. MIT licensed.

How I Used Gemma 4

Mnemonic uses three Gemma 4 capabilities through a single local model: native audio, native vision, and lightweight reasoning. They all run against the same llama-server on 127.0.0.1 - no second model, no external API.

Why E4B

Gemma 4 ships in four sizes. Only two are audio-capable, and only one fits a 16 GB laptop. Numbers from the official Gemma 4 model card:

Size	Audio?	MMLU Pro	BBEH	CoVoST	FLEURS (↓)
E2B	✓	60.0%	21.9%	33.47	0.09
E4B	✓	69.4%	33.1%	35.54	0.08
26B A4B	✗	higher	higher	-	-
31B Dense	✗	highest	highest	-	-

E2B and E4B are the only sizes with the audio encoder (~300M params). Both also ship with a ~150M vision encoder. The 26B and 31B are vision-only - no ears. "Use the biggest model that fits" is a non-starter for this product.

Between E2B and E4B, the deltas matter:

MMLU Pro 60.0 → 69.4 (+9.4). The difference between a model that fumbles unfamiliar technical vocabulary in voice notes and one that doesn't.
BBEH 21.9 → 33.1 (+11.2). Reasoning quality matters for self-correction ("actually, scratch that…") and for intent routing - one misclassification fires the wrong Shortcut.
CoVoST 33.47 → 35.54 and FLEURS 0.09 → 0.08. Modest audio-recognition wins.

At Q4_K_M the E4B GGUF is 4.98 GB (Hugging Face), plus audio and vision mmprojs (~1 GB combined). Co-resident with an IDE and browser on 16 GB.

One model, one pass - for both audio and vision

The conventional architecture for this product is two stages:

ASR (Whisper, Parakeet, etc.) → raw transcript
Text LLM → clean and structure

Mnemonic does both in one Gemma 4 forward pass. The audio goes into the model with a system prompt that says, in effect: "transcribe this and write it the way the speaker would write it themselves." Why it works better than two stages:

One model in memory, one HTTP round-trip per recording. A 2-stage version means two model downloads, two warm processes, two failure modes.
The cleaning prompt operates on the audio, not on a flat transcript. The model can hear pauses, hesitation, restarts - the difference between "I think" as filler and "I think" as opinion. A downstream LLM working from a transcript has already lost that.
Lower end-to-end latency.

The same approach works for vision. A two-stage screenshot-with-voice product would be OCR (Tesseract, Apple Vision) → LLM merge. Mnemonic sends the WAV and the PNG to Gemma 4 in one multipart request, and the model produces a bullet that references both. The image isn't OCR'd in isolation - it's grounded by what the user said while taking it. Captions come out as "the panic the speaker mentioned in line 42" rather than generic "code editor with red error text."

Intent routing - a second narrow call, same model

Letting a voice note fire a macOS Shortcut took some thought. I didn't want to bolt on a tools/function-calling framework, an MCP server, or anything that added attack surface for a side-effect feature running on a user's machine.

What works is a second Gemma 4 call to the same llama-server, on the already-cleaned transcript, with one job - output a single JSON object:

{ "tool": "create-reminder", "input": "call Sarah at 3 PM" }

…or { "tool": "none" } if the transcript isn't a request. No tools registry, no plugin protocol - same model, same server.

Most of the work is around what doesn't fire:

Whitelist required. Mnemonic only runs Shortcuts named in allowed_shortcuts. A hallucinated name is refused before it reaches the OS.
No AppleScript, no shell interpolation. Input is piped to the Shortcut via stdin (shortcuts run NAME --stdin).
Undoable for 5 seconds. The tray menu shows Undo: <name> for the configured window. Click it to run a paired undo-<name> Shortcut.
Thought-dumps don't fire. "I was thinking about reminding Sarah, but maybe she already knows" → {tool: "none"}. Validated at 30/30 on hand-labelled transcripts, including 15 hedged/observational cases that must not fire (docs/spike/intent/PHASE-0-INTENT-FINDINGS.md).
Notes are the source of truth. A fire writes a ↳ Ran shortcut "<name>": <input> continuation line under the bullet. Greppable, auditable.

Cost is ~1.7s of warm latency per recording when enabled, off the user's critical path because the structuring queue is already async.

Implementation

Users run llama-server themselves, on loopback:

llama-server -hf unsloth/gemma-4-E4B-it-GGUF:Q4_K_M --port 5809 --mmproj-auto -ngl 99 -c 8192

The app posts one multipart request per recording (WAV + optional PNG + system prompt) to 127.0.0.1:5809, parses the JSON response, and appends a bullet. With intent routing on, a second JSON-mode call follows.

A few things that matter:

JSON mode + thinking mode. response_format: { type: "json_object" } for structure, chat_template_kwargs: { thinking: true } for chain-of-thought. The thinking tokens cost a few hundred ms but improve handling of technical terms.
The schema shrunk over three versions, then stayed put. v1 had seven fields (title, tags, summary, cleaned, actions, questions, entities). v1.5 had two. v2 has one: { cleaned: String }. Each prune was a UX win - the simpler the schema, the less the model felt the urge to narrate, summarize, or invent. The system prompt explicitly bans third-party narration ("the speaker", "the user", "the recording") and includes two calibration examples. v0.3's vision and intent work added new inputs and a new second call, but didn't grow that schema.
Failure is visible, not silent. llama-server unreachable, malformed JSON twice, silent audio - each produces a stub bullet with the timestamp and a redacted error. No recording is ever lost without a trace; the audio is preserved on disk, and the inbox queue means a job in flight survives a quit or crash.

What works well

Audio transcription and cleaning. Better than the Whisper + LLM pipeline I started with. Audio-grounded cleaning preserves intent across self-correction and hesitation in a way a downstream LLM on a flat transcript can't.
Vision captions are grounded. They answer "what was the speaker talking about" rather than describing everything in the image. Short, useful captions.
Intent routing is conservative. It refuses to fire more often than it fires. That's the right error direction for a feature that can run OS-level actions.
The queue makes the app feel instant. You stop waiting on the model. Tray returns to idle the moment you release the hotkey.
Everything is local. Loopback only. No keys to manage, no quota to worry about, no data leaves the machine.

Limits

Processing is batched after the fact, not streamed. Fine for journaling; wouldn't work for live captioning.
16 GB unified memory is the floor. With a heavy IDE + browser open, memory pressure shows.
I haven't tested non-English voice notes systematically. Gemma 4 is multilingual, but I work in English.
No speaker diarization, no noise suppression. By design, most voice memos are solo.
Intent routing requires building macOS Shortcuts by hand. Powerful if you set it up, but most users won't.
Image OCR is good for screenshots, not for dense document scans. Short captions and inline text work well; multi-column papers don't.

UX

The whole interaction is one keystroke. Default is Ctrl+Option+Space hold-to-record (push-to-talk). Both the combo and the mode (hold or toggle) are configurable in ~/.config/mnemonic/config.toml, and the config hot-reloads - change a hotkey, save, the app re-registers without a restart. Tray transitions are perceptible within 100 ms.

The CLI ships a mnemonic doctor command with a green/red checklist for every common failure mode (mic permission, llama-server reachable, model loaded, mmproj loaded, paths writable). Brew install creates the mnemonic symlink on PATH automatically - no admin password prompt anywhere in the install path.

Built solo (well... not quite, Claude was involved). Source under MIT. Thanks to the Gemma team for shipping audio, vision, and chain-of-thought in a model that fits a laptop.

IPE v0.1.17 - Keyboard Shortcuts, Crash Recovery & macOS Fix

Eduard Maghakyan — Thu, 16 Apr 2026 10:30:55 +0000

A quick update on IPE - the local PR-review UI for Claude Code plans. If you missed the original announcement, check it out here.

Here's what's new since v0.1.15:

Keyboard Shortcuts

You can now navigate the review flow without touching the mouse:

Shift+Tab - Accept the plan
x - Request changes
c - Jump to the comment box
? - Toggle the shortcuts overlay

Shortcut hints are displayed directly on the buttons, so you don't have to memorize them.

Server Crash Recovery

If the IPE server dies unexpectedly (killed process, laptop sleep, etc.), the client now recovers gracefully instead of hanging forever. New sessions also correctly open the browser when connecting to an already-running server.

macOS Binary Fix

If you upgraded to v0.1.15 or v0.1.16 and IPE suddenly stopped launching (getting killed in the terminal), this was caused by macOS Gatekeeper blocking the unsigned binary. v0.1.17 fixes this - the release binary is now properly ad-hoc signed.

If you're affected, just re-run the installer:

curl -fsSL https://raw.githubusercontent.com/eduardmaghakyan/ipe/main/install.sh | bash

What's Next

More improvements coming soon. If you have feedback or ideas, open an issue or drop a comment below!

Generate a PDF from JSON with one API call

Eduard Maghakyan — Sat, 11 Apr 2026 23:43:51 +0000

If you've ever needed to email a quarterly report from a cron job, attach an invoice to a Zapier workflow, or ship a weekly digest from an n8n trigger, you've hit the same wall I've been hitting for years: generating a PDF programmatically is weirdly hard.

The options today are all flavors of the same compromise.

What exists in 2026

PDFMonkey - Starter €5/mo for 300 documents, Pro €15/mo for 3,000. You design your template in their visual editor, then POST variables to fill it in. Solid editor, reasonable pricing. The friction is that every new document shape means a trip back to the visual tool.

APITemplate.io - PDF Basic $19/mo (annual) for 3,000 PDFs. Same core idea as PDFMonkey: design a template in a web editor, then POST JSON data to render it.

DocRaptor - from $15/mo, powered by Prince. HTML-in, PDF-out - so no template editor, but you own the problem of producing print-ready HTML and CSS (page breaks, headers, footers, paged media rules). Powerful if you already have a styled HTML pipeline; more work if you don't.

Rolling your own - Puppeteer or Chromium in a Lambda, @react-pdf/renderer, or wkhtmltopdf in a Docker image. Works until you need it to not-work: font loading, CJK support, memory limits, cold starts, timeouts, and the joy of debugging headless Chrome in CI.

The pattern: most hosted PDF APIs either hand you a template editor (design once, fill slots later) or hand you an HTML-to-PDF pipe (bring your own rendered HTML). Both are fine when the layout is known ahead of time. Both are friction when:

The shape of your data is determined at runtime (e.g., an LLM decides what to render)
You want multiple outputs from the same pipeline - a PDF, a web view, and a raw data table
You need a dashboard with metrics + charts + tables composed together, not a static invoice

The one-API-call approach

A few months ago I started building genui.sh for the workflows I couldn't make the existing tools fit. The pitch is simple: POST a JSON payload describing what you want, get back a signed URL.

Here's the smallest version - a markdown document rendered to a hosted PDF:

curl -X POST https://www.genui.sh/api/artifacts/share \
  -H "Authorization: Bearer $GENUI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "template": "pdf",
    "title": "Q4 Report",
    "content": {
      "text": "# Q4 2025 Performance\n\nRevenue grew **23%** year-over-year.\n\n## Highlights\n- Launched mobile app\n- Expanded to 3 new markets",
      "pageSize": "A4"
    },
    "expiresIn": "7d"
  }'

Response:

{
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://genui.sh/a/550e8400-e29b-41d4-a716-446655440000?token=eyJhbGc...",
    "expiresAt": "2026-04-19T14:30:45.123Z"
  }
}

See it live: Q4 report PDF →

That's the whole workflow. No template to design first, no visual editor to maintain, no data-shape gymnastics. The content field is just markdown - the same kind you'd write in a README. The expiresIn field is optional: on Pro it defaults to 30 days, so pass "never" explicitly if you want a permanent link; on Free and Starter it's clamped to the plan max (7 and 30 days respectively). And Authorization accepts either Bearer <key> or the raw key, whichever your HTTP client prefers.

If you want a table instead of prose:

-d '{
  "template": "table",
  "title": "Recent orders",
  "content": {
    "columns": [
      {"header": "Order ID", "accessorKey": "id"},
      {"header": "Customer", "accessorKey": "customer"},
      {"header": "Amount", "accessorKey": "amount"}
    ],
    "rows": [
      {"id": "ORD-001", "customer": "Acme Corp", "amount": "$1,250"},
      {"id": "ORD-002", "customer": "Globex", "amount": "$890"},
      {"id": "ORD-003", "customer": "Initech", "amount": "$3,420"},
      {"id": "ORD-004", "customer": "Umbrella", "amount": "$560"},
      {"id": "ORD-005", "customer": "Stark Industries", "amount": "$7,800"}
    ],
    "config": {
      "enableSearch": true,
      "enablePagination": true,
      "enableExport": true
    }
  }
}'

See it live: Orders table →

Or a bar chart:

-d '{
  "template": "chart",
  "title": "Monthly revenue",
  "content": {
    "type": "bar",
    "data": [
      {"name": "Jan", "value": 84000},
      {"name": "Feb", "value": 92000},
      {"name": "Mar", "value": 108000},
      {"name": "Apr", "value": 121000},
      {"name": "May", "value": 134000},
      {"name": "Jun", "value": 142000}
    ],
    "config": {
      "showGrid": true,
      "showLegend": true,
      "colors": ["#3b82f6"]
    }
  }
}'

See it live: Monthly revenue chart →

Same URL, same auth header, same response shape. Five output types (markdown, table, chart, pdf, and a composable dashboard format called @std/dynamic) all sharing one endpoint. The chart template supports four sub-types - line, bar, area, pie - so swapping from a bar to a line chart is a one-word change. That's the one-call, multi-format part: the same POST body shape gives you a PDF, a hosted web view, or a live dashboard depending on template.

When things go wrong, errors come back in a consistent shape too:

// 403 - hit your monthly artifact quota
{
  "error": "Monthly artifact limit reached (50). Upgrade your plan for more.",
  "code": "FORBIDDEN"
}

// 429 - rate limited (100 req/min per key)
{
  "error": "Too many requests",
  "code": "RATE_LIMITED"
}

Composing a dashboard

The interesting template is @std/dynamic. Instead of a single content blob, you POST a tree of components - Metric, Chart, Table, Card, Stack - and the renderer composes them into a hosted page:

-d '{
  "template": "@std/dynamic",
  "title": "Weekly snapshot",
  "content": {
    "component": "Stack",
    "props": { "gap": "md" },
    "children": [
      {
        "component": "Metric",
        "props": { "label": "Revenue", "value": "$284,000", "delta": "+12%" }
      },
      {
        "component": "Chart",
        "props": {
          "type": "line",
          "data": [
            {"day": "Mon", "visits": 1200},
            {"day": "Tue", "visits": 1450},
            {"day": "Wed", "visits": 1380}
          ],
          "config": {
            "categoryKey": "day",
            "series": [{"key": "visits", "label": "Visits"}]
          }
        }
      }
    ]
  }
}'

See it live: Weekly snapshot dashboard →

That's the actual payoff: the same tree can be served as a web view today and re-POSTed as "template": "pdf" tomorrow to get a printable version, without re-designing anything. If an LLM is deciding the shape of the output at runtime, this is the format you want it writing.

What you don't have to think about

Hosting the output. The returned URL is a genui.sh page that renders the artifact. No S3 bucket, no CDN config, no expired-link headache.
Signing. The URL comes pre-signed with a token you can set to expire in 1h, 30d, or never.
Fonts, print styles, headers, footers. The renderer handles A4/Letter layout, page breaks, and PDF metadata.
View counts, if you care. Each hosted link tracks opens; you can see them in the dashboard.
Multiple formats from the same data. If you want a PDF and a web dashboard of the same report, you just POST twice with different template values. No duplicated template design.

Works cleanly as an HTTP Request node in n8n, a Webhooks step in Zapier, or an HTTP module in Make — the single POST plus signed-URL response maps to what those tools already expect, so you don't need a custom function to glue anything together.

Pricing

Tier	Price	Artifacts / month	Max payload	Link expiry
Free	$0	50	1MB	7 days
Starter	$7/mo	500	5MB	30 days
Pro	$19/mo	3,000	10MB	No expiry

To be upfront: on raw PDF-per-dollar, you can match or beat these numbers on the big template-editor APIs - PDFMonkey's Pro is €15/mo for 3,000 documents, APITemplate.io's PDF Basic is $19/mo (annual) for the same. genui.sh isn't trying to be the cheapest PDF factory; it's trying to be the one call you make when you want a PDF and a web view and a dashboard from the same JSON, without designing a template first.

The Free tier needs no credit card. Starter and Pro are managed through Stripe's hosted customer portal, so you can change plan, update payment, or cancel at any time without talking to anyone.

When this is not the right tool

Being honest: if your use case is a heavily-branded invoice PDF with an exact legal layout your accountant approves annually, you probably want PDFMonkey's template editor. genui.sh is optimized for the case where the data is dynamic and the layout doesn't need to match a pixel-perfect brand document.

If you need PDF/A compliance, digital signatures, or long-term archival metadata, none of the hosted APIs (including this one) will fit - you want a library like iText or Prince with full control.

Everything else - reports, dashboards, invoices that don't need legal compliance, data exports, AI-generated summaries, automation workflow outputs - one POST is probably enough.

Try it

Grab a free key at genui.sh - 50 artifacts/month, no credit card. Ping me at support@genui.sh with feedback, or open an issue if the docs are unclear.

If you want to see what the composable dashboard format looks like, there's a live example on the landing page rendering a real dashboard from a single POST body.

Building a Local GitHub-style Review Interface for Claude Code Plans

Eduard Maghakyan — Wed, 04 Mar 2026 00:27:49 +0000

Plan mode in Claude Code feels like reviewing a pull request with no comments, no diff, and no history.

Claude thinks for a moment (well much longer than a "moment"), then dumps a wall of text into your terminal - It then asks for approval with many variations of "Yes, ...."

Then I open the Markdown file, start taking notes while reading through it. After doing this for a while I felt there should be a better way.

So I built IPE.

What is IPE?

IPE intercepts Claude Code's ExitPlanMode hook and opens a browser tab showing the plan in a GitHub-style code review interface. You can:

Add inline comments on any block or text selection — just like leaving a review comment on a PR
Click any file reference (e.g. `src/index.ts`) to pop open a syntax-highlighted side drawer showing the actual file contents
Compare plan versions side-by-side when Claude revises after your feedback
Switch between sessions if you're running multiple Claude Code instances at once
Approve or request changes — clicking "Request Changes" bundles your inline comments and sends them back to Claude

The Problems I wanted to fix

You can't annotate. If step 3 looks wrong and step 8 is fine but needs a tweak, you're writing one blob of feedback hoping Claude parses it all correctly.

You lose context across revisions. Claude revises the plan based on your feedback - but there's no diff. Did it actually address your concern? You're re-reading the whole thing from scratch.

How It Works

IPE registers itself as a Claude Code hook on PermissionRequest with the ExitPlanMode matcher. When Claude finishes planning and tries to proceed, the hook fires.

The binary spins up a local HTTP server, opens your browser, and blocks - Claude is sitting there waiting for your response. You review at your own pace (the timeout is 4 days, so no rush). When you approve or request changes, the server sends the response back to the hook and the browser tab closes automatically.

The whole thing is a self-contained binary built with Bun.

Install in One Line

macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/eduardmaghakyan/ipe/main/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/eduardmaghakyan/ipe/main/install.ps1 | iex

That's it. The script downloads the binary and registers the hook in your Claude Code settings automatically. Run it again to update.

Verify it's wired up by running /hooks inside Claude Code — you should see the ExitPlanMode hook listed.

The Workflow in Practice

Work with Claude Code as normal
Claude generates a plan and calls ExitPlanMode
Your browser opens with the plan displayed
Read through it, click file references to inspect code, leave inline comments where needed
Hit Accept → Claude proceeds
Hit Request Changes → your comments go back to Claude, it revises, you review the diff

Try It Out

The project is open source: github.com/EduardMaghakyan/ipe

If you use Claude Code in plan mode regularly, give it a spin and let me know what you think. Issues and PRs welcome - there's a lot of room to grow this (comment threads, keyboard shortcuts, plan history persistence...).