DEV Community: Lachie James

I added voice mode to my AI work tool using ElevenLabs as the speech layer

Lachie James — Wed, 08 Apr 2026 08:02:13 +0000

SlopWeaver is a desktop app that connects your work tools (Gmail, Slack, Linear, Google Docs, and more) and uses AI to handle busywork. It already had a text-based AI chat with full tool access: workspace search, task creation, cross-platform context. Adding voice meant giving the same capabilities a speech interface.

The architecture:

ElevenLabs Conversational AI handles the speech side. It manages the WebSocket connection, speech-to-text, text-to-speech, turn detection, and interruption handling. SlopWeaver is registered as a custom LLM provider. When you speak, ElevenLabs transcribes it and sends the text to SlopWeaver's API in OpenAI Chat Completions format. SlopWeaver processes it through the same Claude chat pipeline as text (tool calls, context retrieval, generation), then streams the response back as SSE chunks. ElevenLabs begins speaking the chunks as they arrive, so TTS starts before the full response is generated.

In practice, the voice conversation has the same tool access as text. You can ask it to search your messages, create tasks, summarize threads, pull context from connected platforms. Voice is an input modality, not a separate product.

In this demo I used voice mode to work through some product issues. Asked the AI to explain a Sentry error on screen, told it to create a task for broken Gmail email rendering, then accepted a proposed fix from the tasks page. One task created by voice, one existing proposal accepted.

Three problems I ran into that might save you time if you're building something similar:

ElevenLabs uses Whisper for transcription, and it misheard domain-specific words constantly. "Slack" became "stack", "Jira" became "gyra", "SlopWeaver" became anything. I built a per-user vocabulary correction service that runs post-transcription. It's a simple find-and-replace pass but it made a big difference in downstream AI comprehension.

The AI's text responses contain markdown, code blocks, URLs, and embedded content. None of that works when spoken aloud. Added a sanitization layer between the chat pipeline output and the SSE stream that strips non-voice-safe content. The AI generates one response, and the rendering layer decides what's appropriate for voice vs text display.

The round trip is: speech capture, ElevenLabs transcription, network to SlopWeaver API, Claude generation (sometimes with tool calls that hit external APIs), network back to ElevenLabs, text-to-speech, audio playback. Each step adds latency, and tool calls add more. Two things helped most: using ElevenLabs' lowest-latency TTS model (eleven_flash_v2_5) and streaming response chunks so TTS can start speaking before generation finishes. The target is under 800ms for the first spoken token on non-tool-call turns.

Each voice conversation turn is a billable action. Preflight affordability check runs when the session starts, actual cost deduction happens after the webhook completes. Sessions are Redis-backed with a 30-minute TTL, which also prevents race conditions between overlapping turns.

Stack: NestJS, React 19, Tauri v2 (desktop), Claude (Anthropic SDK with prompt caching), ElevenLabs Conversational AI (WebSocket + custom LLM webhook), Supabase + pgvector, BullMQ.

Building in public. Previous demos showed the review queue and the cross-platform AI chat. This one adds voice as a third interaction mode.

Fan-out auditing: making AI coding agents actually read your entire codebase

Lachie James — Wed, 25 Mar 2026 13:36:09 +0000

AI coding agents can't fit a large codebase in context. When you ask one to audit 800 files, it reads some and skips the rest. I've tried Claude, ChatGPT, Deep Research for refactoring, type safety, architecture audits. Decent answers every time, but I could always find things they missed just by reading the code myself.

Fan-out auditing trades parallelism for thoroughness.

The idea

Instead of one agent doing a shallow pass on 500 files, you launch 200 agents each doing a deep pass on 5-8 files. AI gets worse at larger batches. An agent reading 5 files will catch things an agent reading 500 won't.

How it works

The prompt is a Claude Code slash command (~300 lines of markdown). The orchestrator:

Runs one grep to find relevant files
Groups files into slices of 5-8, keeping same-directory files together
Shows the slice plan and waits for confirmation
Launches agents in batches of 10, each writing findings to its own .md file in the repo
After all phase 1 agents complete, launches phase 2 agents that read ~12 phase 1 files each and identify cross-cutting patterns
Writes a final synthesis from the phase 1 and phase 2 files

Why small batches matter

This is the core insight. AI performance degrades with input size. Give an agent 5 files and it reads every line, considers each one against the criteria, and produces specific findings with line numbers. Give it 50 files and it starts pattern-matching on file names, skipping files that "look fine," and producing vague observations.

The fan-out pattern forces thoroughness by keeping each agent's scope small enough that it has no excuse to skim.

Why every agent writes to a file

The first version of this used agents that returned findings in their response, and the orchestrator summarized them. That's lossy. An agent finds 8 specific issues with line numbers, the orchestrator compresses them to "several type safety issues found," and the detail is gone.

When every agent writes to its own .md file in the repo, nothing gets lost. The orchestrator synthesizes from files, not from compressed return values in context. You can also watch the files appear in real time in your editor, which is useful for long runs.

Phase 2: cross-cutting patterns

Individual agents can only see their slice. If 9 different slices all have the same issue, no single agent knows that. Phase 2 agents each read ~12 phase 1 output files and look for patterns that span multiple slices. This is where findings like "this same function is reimplemented in 8 modules" emerge.

Phase 1 runs on Sonnet (file reading is straightforward). Phase 2 runs on Opus (reasoning across 12 reports to spot non-obvious patterns is harder).

What it's good for

I've used the same pattern for:

Copy audits: checking user-facing text against a style guide or tropes list
Refactoring: finding duplicated logic, consolidation opportunities, dead code
Selling point discovery: reading every file to find features worth marketing
Architecture audits: checking module boundaries, dependency violations, pattern compliance

You swap the reference document and the pre-filter grep. The fan-out mechanics stay the same.

What it doesn't fix

A human who knows the codebase inside out will still catch things this misses. The AI still can't reason about high-level architecture decisions or understand business context that isn't in the code. But the difference between "AI read some files and gave vague observations" and "AI read every file and gave findings with line numbers" is worth the 29 minutes.

The test run

I used it to check all user-facing text in my product (SlopWeaver, ~800 source files) against tropes.fyi (a catalog of AI writing tells).

Results: 201 slices, 809 files inspected, 220 output files, 180+ findings.

It scales to any repo size. A repo with 10,000 files would produce more slices and take longer, but the same prompt works.

Stack: Claude Code, Claude Sonnet 4.6 (phase 1), Claude Opus 4.6 (phase 2).

The prompt is open source: github.com/lachiejames/fan-out-audit

One markdown file, drop it into .claude/commands/. The repo includes the full output from this audit (all 220 files) so you can browse what it produces before running it.

I built an AI chat that searches your work tools and cites its sources

Lachie James — Mon, 23 Mar 2026 04:24:17 +0000

Every AI assistant I've tried needs you to manually provide context from your other tools.

SlopWeaver's AI chat skips that. It's connected to your work tools (Gmail, Slack, Linear, Google Docs, and more) and searches across them when you ask a question. Every source is cited inline with platform-colored chips. Hover to preview, click to navigate to the original.

This demo shows the flow:

Open a security assessment report in the inbox
Ask the AI "what's this about?" and get a quick summary
Ask for a deep analysis: the AI triggers workspace search, knowledge lookup, and extended reasoning
Get a structured report: findings with CVSS scores, per-finding remediation status from Linear tickets, a supply chain incident timeline from Slack, stakeholder responsibilities, and your personal action items with deadlines
33 sources cited inline from Gmail, Slack, Linear, and knowledge sources

The search layer is hybrid: keyword + semantic (Voyage AI embeddings, 1024 dimensions) + reranking (Voyage rerank-2.5). Entity resolution connects "Daniel Frost" across platforms into one identity. Claude generates the response with extended reasoning and numbered citations pointing back to specific source documents.

The citation UX was the hardest part to get right. Each citation chip is color-coded to its platform (Gmail orange, Slack purple, Linear blue). Hovering shows a preview card with the original subject, sender, timestamp, and excerpt. Clicking navigates to the source in the inbox or opens it externally.

Stack: Tauri v2 (desktop), NestJS, React 19, Claude (Anthropic SDK with prompt caching), Supabase + pgvector, BullMQ, Voyage AI.

Building in public. The previous demo showed the approval queue (AI drafts, you review before anything sends). This one shows the intelligence layer behind it. Together they tell the story: AI that can see across your tools AND still waits for you to act.

I built an AI inbox that can't send emails (and that's the whole point)

Lachie James — Wed, 18 Mar 2026 05:40:50 +0000

Every AI productivity tool seems to promise the same thing: "AI handles your email/messages/tasks automatically!"

And every time I try one, the first thing I do is turn off the auto-send because I don't trust it to email my coworkers unsupervised.

So I built the opposite. SlopWeaver is a desktop AI work inbox where:

AI pulls context from your connected tools (Gmail, Slack, Linear, etc.)
AI drafts replies and stages actions
Everything lands in a review queue
Nothing sends without you

The short demo shows this queue workflow in action. One message comes in, context gets pulled automatically, draft is ready.

The "nothing auto-sends" constraint has been the best architectural decision in the project. It simplifies trust, UX, and error handling all at once.

Stack: Tauri v2, NestJS, React 19, Claude (Anthropic SDK), Supabase + pgvector, BullMQ.

Building in public. Would love feedback from anyone who's tried to solve the "too many tools, not enough context" problem.