DEV Community: Machina Tools

How I Built Transcriber - Local Whisper Voice Backend for Browser AI Tools

Machina Tools — Sun, 19 Jul 2026 11:16:14 +0000

Voice input in the browser is a study in inconsistency.

Chrome, Edge, and most Chromium-based browsers have the Web Speech API with continuous recognition that works well for dictation. Firefox ships a stub that throws a not-supported error at runtime. Safari's implementation works but requires a permission gesture every few sentences. And all of them send your audio to a cloud service.

When I built voice dictation for PromptBoard, I needed something that worked everywhere and ran locally. Transcriber is the answer: a Node.js server that runs Whisper in-process and exposes a single HTTP endpoint. Any browser tool that wants voice input sends audio there and gets text back.

The cascade

Every Machina tool that supports voice uses the same three-level cascade:

Level 1 — Web Speech API (Chromium)
If window.SpeechRecognition is available, use it. Zero latency, zero network. Happy path for Chrome users.

Level 2 — Transcriber (port 4324)
For Firefox, MediaRecorder captures audio as a webm blob. On stop, it's POSTed to /transcribe. Transcriber converts to WAV with ffmpeg, runs Whisper, returns text. Round-trip: 1–3 seconds.

Level 3 — Manual fallback
If Transcriber isn't running and Web Speech isn't available, a dialog opens with the recorded audio and a textarea. Nothing is silently lost.

The API

GET  /health      → { ok, ready, model, language }
POST /transcribe  → { ok, text }  (body: raw audio bytes)
POST /shutdown    → graceful stop

Small surface, single responsibility. Transcriber doesn't manage sessions, doesn't store audio, doesn't know which tool is calling it.

The model downloads on first run (~150MB for whisper-base) and caches locally. You can swap models via env var:

TRANSCRIBER_MODEL=Xenova/whisper-small  # better accuracy, ~460MB
TRANSCRIBER_MODEL=Xenova/whisper-tiny   # faster, ~75MB

→ Full post on machina.chat

Transcriber is part of Machina — a free, open-source suite of AI developer tools.

How I Built ContextForge - Giving Your AI a Complete Project Briefing Before Every Session

Machina Tools — Sun, 19 Jul 2026 11:15:48 +0000

Every AI debug session starts the same way.

You open the chat, and before you can describe the actual problem, you have to explain everything else: what the project is, what changed recently, what the logs are saying, what you already tried. Five minutes of context-setting before you even get to the question.

ContextForge solves this by automating the briefing. It pulls together everything the AI needs to know — git diff, server logs, BugCapture reports, config files — and assembles it into a single structured prompt. You open a session, hit Generate, and paste. The AI is immediately oriented.

→ Full post on machina.chat

ContextForge is part of Machina — a free, open-source suite of AI developer tools.

Activation Functions: Why Non-Linearity Is Everything

Machina Tools — Wed, 01 Jul 2026 06:40:29 +0000

There's a proof worth knowing: if you stack linear transformations without any non-linearity between them, the entire network is equivalent to a single linear transformation. Ten layers, a hundred layers, a thousand - they all collapse to one matrix multiply. Activation functions are what prevent this collapse.

The linearity collapse, demonstrated

import numpy as np

W1 = np.random.randn(4, 4)
W2 = np.random.randn(4, 4)
W3 = np.random.randn(4, 4)

W_collapsed = W3 @ W2 @ W1
x = np.random.randn(4)

out_deep = W3 @ W2 @ W1 @ x
out_shallow = W_collapsed @ x

print(np.allclose(out_deep, out_shallow))  # True — three layers = one layer

The three layers have zero additional expressive power over one. Adding a non-linear function between each layer breaks this.

Sigmoid: the original, and its problems

def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))

def sigmoid_grad(x):
    s = sigmoid(x)
    return s * (1.0 - s)

for x_val in [-5, -2, 0, 2, 5]:
    g = sigmoid_grad(x_val)
    print(f"x={x_val:3d}  gradient={g:.6f}")

x= -5  gradient=0.006648
x= -2  gradient=0.104994
x=  0  gradient=0.250000
x=  2  gradient=0.104994
x=  5  gradient=0.006648

At x=±5, the gradient is 26× smaller than at x=0. In a 10-layer network, the compound effect kills gradients entirely — the vanishing gradient problem.

ReLU: the surprisingly effective fix

def relu(x):
    return np.maximum(0, x)

def relu_grad(x):
    return (x > 0).astype(float)

The gradient for positive inputs is exactly 1. Gradients don't shrink as they pass through ReLU on the positive side. Deep networks could finally be trained.

The cost: neurons whose inputs are consistently negative receive zero gradient — the "dying ReLU" problem. In practice this matters less than you'd think.

# Leaky ReLU: small gradient for negatives
def leaky_relu(x, alpha=0.01):
    return np.where(x > 0, x, alpha * x)

GELU: what GPT uses

GELU is a smooth approximation of ReLU:

GELU(x) ≈ 0.5 × x × (1 + tanh(√(2/π) × (x + 0.044715 × x³)))

for x_val in [-0.5, -0.2, 0.0, 0.2, 0.5]:
    r = max(0, x_val)
    g = gelu(np.array([x_val]))[0]
    print(f"x={x_val:4.1f}  ReLU={r:.4f}  GELU={g:.4f}")

x=-0.5  ReLU=0.0000  GELU=-0.1543
x=-0.2  ReLU=0.0000  GELU=-0.0563
x= 0.0  ReLU=0.0000  GELU=0.0000
x= 0.2  ReLU=0.2000  GELU=0.1155
x= 0.5  ReLU=0.5000  GELU=0.3457

The smoothness makes optimization slightly easier. GPT-2 and BERT both use GELU.

SwiGLU: what modern models use

SwiGLU is the activation used in LLaMA, Mistral, and most current large models:

import torch
import torch.nn as nn
import torch.nn.functional as F

class SwiGLU(nn.Module):
    def __init__(self, d_model, d_ff):
        super().__init__()
        self.W = nn.Linear(d_model, d_ff, bias=False)
        self.V = nn.Linear(d_model, d_ff, bias=False)
        self.out = nn.Linear(d_ff, d_model, bias=False)

    def forward(self, x):
        gate = F.silu(self.W(x))   # SiLU = x * sigmoid(x)
        content = self.V(x)
        return self.out(gate * content)

One linear projection gates whether the other passes through — more expressive than a simple element-wise non-linearity.

Gradient flow comparison

def test_gradient_flow(activation_fn, depth=20, seed=0):
    torch.manual_seed(seed)
    layers = []
    for _ in range(depth):
        layers.extend([nn.Linear(64, 64), activation_fn()])
    model = nn.Sequential(*layers)
    x = torch.randn(16, 64, requires_grad=True)
    out = model(x).sum()
    out.backward()
    return x.grad.abs().mean().item()

activations = {"ReLU": nn.ReLU, "Sigmoid": nn.Sigmoid, "GELU": nn.GELU, "SiLU": nn.SiLU}
for name, act in activations.items():
    grad = test_gradient_flow(act, depth=20)
    print(f"{name:<10}: input gradient magnitude = {grad:.6f}")

ReLU      : input gradient magnitude = 0.003241
Sigmoid   : input gradient magnitude = 0.000001
GELU      : input gradient magnitude = 0.004817
SiLU      : input gradient magnitude = 0.004923

Sigmoid is thousands of times worse. ReLU, GELU, and SiLU are all in the same ballpark — the gap between them matters far less than the gap from sigmoid.

Summary

Function	Where used	Key property
Sigmoid	Old networks	Saturates; vanishing gradients
ReLU	CNNs, MLPs	Simple; gradient=1 for positives
GELU	GPT-2, BERT	Smooth; slight negative outputs
SiLU/Swish	Modern models	Smooth; slightly better performance
SwiGLU	LLaMA, Mistral	Expressive gating mechanism

The progression follows one thread: keep gradients alive through many layers, give the network enough expressive power, don't overcomplicate what works.

This is part of an ongoing series on AI internals. Full article with more context at machina.chat/blog.

Steering Vectors: Changing What an LLM Wants Without Touching Its Weights

Machina Tools — Sun, 28 Jun 2026 22:56:24 +0000

LLMs encode concepts as geometric directions in activation space. You can find those directions, add them at inference time, and shift model behavior - without touching a single weight.

This is called steering vectors, and it works.

The core idea

A language model's residual stream is a high-dimensional vector that accumulates information as it passes through layers. The linear representation hypothesis says that concepts like "pessimism," "formality," or "Python expertise" correspond to specific directions in this space.

If that's true, you should be able to:

Find the direction that encodes a concept (using contrastive examples)
Add a scaled version of that direction at a specific layer
Observe the model behaving more "conceptfully"

And you can.

Extracting a steering vector

The simplest method: take sentence pairs that differ only in the target concept, run them through the model, and average the difference in activations at a chosen layer.

import torch
from transformer_lens import HookedTransformer

def extract_steering_vector(model, positive_prompts, negative_prompts, layer=16):
    """Extract direction that points from negative to positive concept."""
    pos_acts, neg_acts = [], []

    with torch.no_grad():
        for prompt in positive_prompts:
            _, cache = model.run_with_cache(prompt)
            pos_acts.append(cache[f"blocks.{layer}.hook_resid_post"][0, -1])

        for prompt in negative_prompts:
            _, cache = model.run_with_cache(prompt)
            neg_acts.append(cache[f"blocks.{layer}.hook_resid_post"][0, -1])

    vector = torch.stack(pos_acts).mean(0) - torch.stack(neg_acts).mean(0)
    return vector / vector.norm()

Applying it at inference

Once you have the vector, inject it via a hook:

def make_hook(steering_vector, alpha=20.0):
    def hook_fn(value, hook):
        return value + alpha * steering_vector
    return hook_fn

def generate_steered(model, prompt, steering_vector, layer=16, alpha=20.0):
    hook = make_hook(steering_vector, alpha)
    hook_name = f"blocks.{layer}.hook_resid_post"
    with model.hooks(fwd_hooks=[(hook_name, hook)]):
        tokens = model.generate(prompt, max_new_tokens=100)
    return model.to_string(tokens[0])

What you can steer

The same technique works for a surprising range of properties:

Pessimism/optimism - shifts narrative tone measurably
Python enthusiasm - makes the model reach for Python examples
Formality - shifts register from casual to professional
Sycophancy - can be used to reduce agreement-seeking behavior

The key finding from Turner et al. (2023) and Zou et al. (2023): these vectors generalize. A pessimism vector extracted from weather sentences transfers to unrelated topics.

Checking if a concept is linearly encoded

Before steering, you can verify the concept is actually linearly represented using a probe:

from sklearn.linear_model import LogisticRegression

def probe_concept(model, positive_prompts, negative_prompts, layer=16):
    X, y = [], []
    with torch.no_grad():
        for prompt in positive_prompts:
            _, cache = model.run_with_cache(prompt)
            X.append(cache[f"blocks.{layer}.hook_resid_post"][0, -1].numpy())
            y.append(1)
        for prompt in negative_prompts:
            _, cache = model.run_with_cache(prompt)
            X.append(cache[f"blocks.{layer}.hook_resid_post"][0, -1].numpy())
            y.append(0)

    probe = LogisticRegression().fit(X, y)
    return probe.score(X, y)  # accuracy > 0.9 = concept is linearly encoded

High probe accuracy means the concept is cleanly linearly separable in activation space - and steering should work well.

Full writeup with more experiments and analysis on the Machina blog: Steering Vectors: Changing What an LLM Wants Without Touching Its Weights

Originally published at machina.chat

How I Built PromptBoard — A Visual Canvas for Building AI Prompts

Machina Tools — Sun, 21 Jun 2026 02:14:00 +0000

There's a class of AI prompts that don't fit in a text box.

Not because the ideas are too long — you can always write more. The problem is that the structure of what you want to communicate is inherently visual. You're describing a flow. You're pointing at an image. You're listing constraints that apply to some parts of the context but not others. You're trying to give the AI a briefing, not a paragraph.

The text box forces everything into one dimension. And the AI, however capable, has to reconstruct the structure you had in your head from a flat string of text.

PromptBoard solves this by flipping the approach: you build the prompt visually first, then export it.

The problem with prompting complex tasks

Every developer who uses AI agents regularly hits a pattern like this:

You have a bug to fix. It's not a simple bug — it involves a flow you need to explain, a screenshot of the broken state, three or four constraints the fix has to respect, and a description of what the correct behavior should look like.

You start typing. You write the task description, then realize you need to explain the flow first. You paste in a screenshot and then write around it. You add the constraints at the end but they're not clearly linked to the specific parts they apply to. By the time you hit send, the prompt is a 400-word wall of text with an image in the middle.

The AI can often handle this. But you're asking it to do structural inference that you could have done once, clearly, in a canvas.

The deeper issue is that prompts have a natural graph structure: nodes (concepts, constraints, examples) with labeled relationships between them. A text box serializes that graph into a linear sequence and throws away the relationship labels.

The design: blocks, arrows, export

PromptBoard is built around three concepts:

Blocks are the nodes. There are three types:

Text — free-form content, the main carrier of context. Can have an optional label.
Image — drag-and-drop or paste a screenshot. Gets embedded as base64 in the export.
Flow — a process/decision/terminal node for describing logic visually.

Arrows connect blocks and carry a label. "This constraint applies to this flow step." "This screenshot is evidence for this bug description." The relationships are explicit, not inferred from reading order.

Export serializes the canvas back to text — structured text. Blocks are rendered in top-to-bottom, left-to-right order. Arrows become a ## Flow section listing every connection with its label. Images are embedded as base64. The output is a Markdown file any AI can parse immediately.

Why a canvas, not a form

The first version of this tool was a form. Title field, description field, constraints field, image upload. Structured, explicit, readable.

It was unusable.

The problem with forms is that they impose a fixed schema. Your context doesn't always have a title and a description and three constraints. Sometimes it's just two things that are connected. Sometimes you have five images and no text yet.

A canvas has no schema. You start with an empty surface and put things where they make sense. The structure emerges from the layout, not from a pre-defined form. That's exactly how you think through a problem before you explain it — spatially, not linearly.

How voice dictation works

PromptBoard has voice dictation on every text block and arrow label. Two modes:

Chromium (Chrome, Edge): uses the Web Speech API with continuous: true. You click the mic button, talk, and transcribed text appends to the block in real time. No server, no API, no latency — the model runs in the browser.

Firefox and others: MediaRecorder captures the audio, then sends it to a local Whisper server (Transcriber, port 4324) for transcription. If Transcriber isn't running, a dialog appears — you can type what you said, or replay the audio.

The asymmetry is intentional: Chromium's built-in speech recognition is good enough for note-taking velocity. Whisper is better for longer or more technical dictation.

The export format

# Fix the checkout form

**Goal**
Fix the checkout form — Cart component won't submit after the last refactor

**Constraints**
No new deps · TypeScript strict · keep under 50 lines

![cart-screenshot](data:image/png;base64,...)

**[▭ Process]** Cart validates form fields

**[◇ Decision]** Payment API responds?

**[○ Terminal]** Show success or error state

## Flow

Cart validates form fields → Payment API responds? (calls POST /api/checkout)
Payment API responds? → Show success or error state (on failure: surface error message)

When the AI reads this, it has: the task in plain language, constraints explicitly stated, the screenshot as direct visual evidence, and the flow as a labeled graph.

Technical architecture

PromptBoard is a single HTML file, around 1,100 lines. No build step, no npm install, no server. Open it in a browser and it works.

State: a single S object holds all blocks, arrows, history stack, and interaction state. Everything is JSON-serializable. Boards are saved to localStorage (up to 20 boards).

Undo/redo: snapshot-based history (JSON.stringify + JSON.parse of the state). Up to 60 snapshots. Ctrl+Z / Ctrl+Y work everywhere outside a text input.

Arrows: rendered as SVG quadratic Bézier curves with a slight perpendicular offset to avoid overlapping block edges. Hit areas are 14px-wide transparent paths over 1.5px visible paths.

Canvas: 3000×2000px scrollable area. Blocks are position:absolute divs. Drag uses mousedown on the header + mousemove + mouseup on document.

Strengths

No installation. The tool lives in one file. Put it on a USB drive, serve it from any static host, or just keep it in your project folder and open it with a double-click.

Multimodal output. The base64 image embedding means the exported .md is self-contained — images travel with the text. Paste the entire thing into Claude or GPT-4o and the screenshots are right there.

Voice-first friendly. For developers who think faster than they type, or who are debugging a live environment and need both hands, voice dictation makes PromptBoard usable without touching a keyboard.

Composable with the rest of Machina. The .md export is the same format BugCapture produces. A natural workflow: BugCapture records the bug, ContextForge adds the git diff and logs, PromptBoard adds the visual structure and constraints.

Try it

PromptBoard is part of Machina — a free, open-source suite of AI developer tools.

git clone https://github.com/machina-tools/machina.git

Then open tools/promptboard/index.html in your browser. No server needed.

→ GitHub | machina.chat

How I Built LearnBoard — The UI That Makes Your AI Remember You

Machina Tools — Sun, 21 Jun 2026 02:13:01 +0000

There's a problem that every developer who works with AI agents eventually runs into: the AI doesn't remember you.

You spend twenty minutes at the start of every session re-explaining your stack, your preferences, the constraint you discovered last week, the mistake you almost made twice. You know you have to do this. The AI has no memory of your last conversation. Every session, it starts fresh.

This is the biggest hidden cost of working with AI agents. It's not the hallucinations or the wrong answers — those are visible failures you can debug. The bigger cost is the invisible overhead: the context-building you do every single time, the lessons that get re-learned, the preferences that get ignored, the mistakes that happen again because the AI didn't know they were mistakes.

LearnBoard is the tool I built to solve this.

The core idea: structured memory as a file

The insight that made this possible is simple: if you want an AI to remember something persistently, put it in a file it reads at session start.

This isn't a new idea. CLAUDE.md works this way. Many productivity workflows work this way. But the gap was always the management layer — there was no way to see what was in the memory, search it, or edit it without opening a raw text editor and hoping you understood the schema.

LearnBoard is the management interface for a structured memory file called LEARNING.md. Everything the AI has learned about you — your workflow preferences, patterns it has recognized, mistakes to avoid, successful approaches to revisit — lives in that file. LearnBoard makes that invisible layer visible, searchable, and editable in real time.

How it works

LearnBoard is a Node.js server (port 4331) that serves a web dashboard for your LEARNING.md file. The file uses a structured Markdown format with defined sections:

Lessons — explicit things the AI has learned ("always prefer local tools over cloud APIs")
Tools — the tools and versions in your environment
Suggestions — pending ideas from the AI that haven't been implemented yet
Stats — success rates, session counts, learning velocity

The server watches the file with chokidar and pushes updates to the UI over Server-Sent Events — so when you open a second terminal and the AI writes to the file, you see it appear in the dashboard in real time.

LEARNING.md excerpt:

## Lessons Learned

| # | Category | Lesson | Confidence | Sessions |
|---|----------|--------|------------|---------|
| 15 | tooling | Always prefer local/free solutions — never propose paid APIs without exhausting local alternatives first | high | 12 |
| 18 | ux | User prefers autonomous tools that find context on their own — not manual forms to fill in | high | 8 |
| 23 | ops | Always restart via the bash script that rebuilds nvm — `nohup node` fails silently without nvm env | confirmed | 5 |

## Pending Suggestions

| # | Suggestion | Status | Votes |
|---|-----------|--------|-------|
| 4 | Add keyboard shortcut to export BugCapture without clicking | pending | +3 |
| 7 | Auto-detect project from git remote in ContextForge | in-review | +2 |

That file, prefixed to the AI's system prompt, means the agent starts every session already knowing your preferences, your environment, and what approaches have worked or failed before.

The dashboard

The web UI has four main views:

Lessons table — all lessons with confidence score (low / medium / high / confirmed), category filter, free-text search, and inline editing. Click any cell to edit. New lessons append instantly. The AI can add lessons via a CLI flag; you see them appear in real time.

Tools inventory — your stack: language versions, frameworks, key dependencies. LearnBoard reads your package.json, .nvmrc, and SSH environment automatically to bootstrap this section.

Suggestions queue — pending ideas from the AI, with a +1/−1 voting system. Ideas that accumulate positive votes get promoted to the "accepted" column, which the AI treats as confirmed guidelines for future sessions.

Session stats — a lightweight histogram of session count, fix rate, and learning velocity over time. You can see that lesson #15 has appeared in 12 sessions and has a 94% success rate. The AI isn't just guessing — it has evidence.

The innovation: meta-AI

The thing that makes LearnBoard different from a personal wiki or a note-taking tool is that it's designed to be read by the AI, not by you.

The structured format is chosen specifically to be unambiguous to a language model. The confidence scores, session counts, and vote history are signals the AI uses to weight lessons against each other. When two lessons conflict, the one with more sessions and higher confidence wins.

The AI can also write to the file. After a successful session, you can ask Claude or Copilot to "add a lesson to LEARNING.md about what we just discovered." It knows the schema, writes to the right section, and the dashboard updates immediately.

This is the learning loop: the AI learns from each session, stores the lesson, and is better informed for the next one. LearnBoard makes that loop visible and controllable.

Key strengths

Full local operation. No cloud, no sync, no account. LEARNING.md is a plain text file you can read, commit, backup, and share without any vendor dependency.

AI-agnostic. The file format works with Claude, Copilot, GPT-4, Gemini, or any agent that accepts system prompt context. You're not locked into a specific tool.

Human-readable fallback. When no dashboard is running, the memory layer is just a Markdown file. No black box.

Survivable architecture. When a new AI model comes out, you don't migrate data. The file stays the same. The new model reads the same lessons on day one.

The technical stack

Component	Technology
Server	Node.js ESM
File watching	`chokidar`
Markdown parsing	`marked`
Real-time updates	Server-Sent Events
Frontend	Vanilla JavaScript, no framework

The server starts in under a second and uses less than 50MB of RAM.

The real test

I've been running LearnBoard on every project for four months. My LEARNING.md file has grown to 34 lessons, 18 pending suggestions, and a tools inventory for 6 active projects.

The sessions where I don't preload the context are noticeably worse. The AI proposes solutions I've already ruled out, asks questions I've already answered, and sometimes makes the exact mistakes that are documented in the file.

The most concrete evidence: lesson #19 documents a deployment pattern specific to one client's server setup. I've referenced it in 7 sessions since adding it. Every time, the AI uses it without being told. That's 7 explanations I didn't have to give.

Try it

LearnBoard is part of Machina — an open source suite of tools that close the gap between how you work and how your AI understands you.

git clone https://github.com/machina-tools/machina.git
cd machina
bash setup.sh
cd tools/learnboard && node server.mjs

Then open http://localhost:4331 in your browser.

→ GitHub | machina.chat

How I Built BugCapture — From Screen Recording to AI-Ready Bug Report in One Click

Machina Tools — Sun, 21 Jun 2026 02:11:59 +0000

I was debugging a form alignment issue on a client's production server. Remote machine, no local environment. The kind of problem where you know exactly what you're seeing but translating it into words for your AI agent takes longer than finding the bug yourself.

"The second column in the input group is a few pixels wider than the first, but only when the browser is at an intermediate viewport width — somewhere between 768px and 900px — and only after a user has interacted with the first field. The offset appears to be about 12px..."

By the time you've written that, you've already lost the time you were trying to save. And the AI's first three responses are clarifying questions, because even that description is ambiguous.

This is the problem BugCapture solves: it turns a 47-second screen recording into a structured file your AI agent can act on immediately — with no text description from you, no manual screenshots, no copy-pasting error messages.

The insight: bugs have a natural format

Modern AI agents — Claude, Copilot, GPT-4o — are multimodal. They can look at screenshots. They can read transcripts. The question isn't whether they can understand a bug from visual evidence; they clearly can. The question is: what format packages that evidence in a way that maximizes AI understanding?

The answer, after a lot of iteration, is a Markdown file with:

A voice transcript from the developer reproducing the bug (what you're thinking while you click)
Sequential screenshots at regular intervals (what the screen looked like over time)
Optional SSH log capture (what the server was doing at the same time)

This combination gives the AI three independent channels of information about the same event. The transcript explains intent. The screenshots show the visual state. The logs show the runtime state. An AI reading all three can build a more accurate model of the bug than it could from any one source alone.

How it works

The workflow is exactly three steps:

1. Record — click Record in the BugCapture browser interface. The page requests screen and microphone access. You reproduce the bug while narrating what you see. The recording is captured as a MediaRecorder stream — audio and video in parallel, fully local.

2. Process — when you click Stop, the server runs two pipelines simultaneously:

Frame extraction: ffmpeg extracts one screenshot every 3 seconds (configurable), converts them to JPEG at 85% quality. Up to 20 frames per recording.
Transcription: @xenova/transformers runs the Whisper base.en model on the audio — fully offline, no API key, no data upload. On a modern laptop, a 47-second recording transcribes in about 8 seconds.

3. Export — you get a .md file: screenshots embedded as base64 + the full transcript, structured for AI consumption.

Drop that into Claude's context or Copilot Agent's workspace, and the AI has everything it needs. No text description from you. No manual screenshot upload.

LogLens: adding the server side

BugCapture has an optional LogLens mode: enable it before recording, and the server opens an SSH connection to your remote machine and tails the configured log files in parallel with the screen capture. When you export, the .md includes a timestamped log capture alongside the visual evidence.

The real test

The form alignment bug I mentioned: I recorded 47 seconds of screen and voice, exported the .md, and dropped it into Copilot Agent.

Copilot identified a conflicting width rule in a child theme stylesheet that was being applied conditionally after the first user interaction triggered a re-render. The fix was three lines of CSS. Total time from "I see the bug" to "fix deployed": under two minutes.

Key strengths

Completely offline. The Whisper model runs locally via ONNX. No transcription API, no upload, no account.

AI-agnostic output. The .md file works with any AI that accepts text: Claude, Copilot, GPT-4, Gemini, local models via Ollama.

Zero configuration for basic use. Install Node.js, clone the repo, node server.mjs. No API keys required.

Technical stack

Component	Technology
Screen + audio capture	Web MediaRecorder API
Frame extraction	ffmpeg
Transcription	`@xenova/transformers` + Whisper ONNX
SSH log capture	`ssh2`
Output format	Markdown with base64 JPEG
Server	Node.js ESM, no framework

Try it

BugCapture is part of Machina — an open source suite of tools that close the gap between "I see the bug" and "AI fixes the bug."

git clone https://github.com/machina-tools/machina.git
cd machina
bash setup.sh
cd tools/bugcapture && node server.mjs

Then open tools/bugcapture/index.html in your browser.

→ GitHub | machina.chat

I Built Open-Source AI Dev Tools to Close the Gap Between Seeing a Bug and Fixing It

Machina Tools — Wed, 10 Jun 2026 12:45:26 +0000

I've been using AI assistants like GitHub Copilot and Claude for debugging for over a year now. And I kept running into the same frustrating moment: I'd see a bug on my screen, but by the time I described it to the AI, I'd lose half the context — the exact error, what I clicked, what the logs said.

So I built Machina — a suite of open-source tools designed to close that gap.

The Core Problem

When you're debugging with an AI assistant, you need to transfer context: what you see, what happened, what the environment looks like. Doing this manually is slow and lossy. You forget things. You mis-describe things. The AI works with incomplete information.

Machina automates that context transfer.

The Tools

BugCapture

Records your screen + audio, transcribes your voice with Whisper, takes a screenshot, and generates a ready-to-paste .md file with everything an AI needs to understand your bug. No more copy-pasting error messages or describing what you see.

ContextForge

Before starting an AI debugging session, run ContextForge. It pulls your recent git diff, SSH logs, and any BugCapture output into a single structured briefing. Your AI starts the session already knowing what changed and what broke.

LearnBoard

A UI for LEARNING.md — a persistent memory file for your AI. Instead of re-explaining your codebase conventions every session, LearnBoard lets you manage what the AI should always remember. With stats on how often each lesson is used.

PromptBoard

A drag-and-drop canvas for building structured prompts. Combine context blocks, templates, and voice input (via browser or local Whisper) into prompts that consistently get better AI responses.

Why Open Source

I wanted these tools to be something the community can extend and adapt. Every tool is self-contained: a server.js, an index.html, and a package.json. Run bash setup.sh and you're done.

The repo is at github.com/machina-tools/machina.

Launch Day

Today is launch day on Product Hunt! If these tools sound useful to you, I'd love your support:

👉 Vote on Product Hunt

Happy to answer any questions in the comments — about the tools, the architecture, or the debugging workflow that inspired this.

Built with Node.js, Whisper (OpenAI), and a lot of late-night debugging sessions.